[gavin@localhost get_movie]$ python get_movie.py
Input the link of QQ movie
http://v.qq.com/list/2_-1_-1_-1_1_0_0_20_-1_-1.html
http://v.qq.com/list/2_-1_-1_-1_1_0_0_20_-1_-1.html
刀客家族的女人
http://v.qq.com/cover/3/3x0x6czcrvedphk.html
生活启示录
http://v.qq.com/cover/z/zvqk3fww8130zu3.html
产科男医生
http://v.qq.com/cover/k/k4tffs6sdczjkur.html
你们被包围了
http://v.qq.com/cover/n/nijiw7wrm0ubp0z.html
Triangle三角
http://v.qq.com/cover/q/q0c1lhfa4umqbw3.html
加油爱人
http://v.qq.com/cover/6/614gfunx9aode39.html
飞哥大英雄
http://v.qq.com/cover/o/ozdulcozhtj3hlx.html
金玉良缘
http://v.qq.com/cover/e/ej4pj00outhp38n.html
诛三计
http://v.qq.com/cover/4/42qmq5prq32tfgb.html
如果我爱你
http://v.qq.com/cover/n/no0xky7q8phhkc6.html
密使2之江都谍影
http://v.qq.com/cover/7/7aue7d27yearp19.html
小宝和老财
http://v.qq.com/cover/w/wlunt0wb380jthm.html
咱们结婚吧
http://v.qq.com/cover/b/blf9ksy1ulf6z33.html
步步惊情
http://v.qq.com/cover/i/ikibji2k73dqazu.html
大当家
http://v.qq.com/cover/7/765s6mwbdbpb3ep.html
宫锁连城
http://v.qq.com/cover/n/npzqf0vd4i5nqjh.html
铁血独立营
http://v.qq.com/cover/r/r8m1s4yzu1wc1hl.html
我爱男闺蜜
http://v.qq.com/cover/6/6hbwvj85sic930l.html
薛丁山
http://v.qq.com/cover/2/2qq1c1iqph9qpxd.html
一仆二主
http://v.qq.com/cover/a/anlxed56zwbpm16.html
也可建立HTML网页
上代码:
#!/usr/bin/env python
# coding=utf-8
#########################################
#> File Name: get_movie.py
#> Author: nealgavin
#> Mail: nealgavin@126.com
#> Created Time: Sat 24 May 2014 09:03:58 PM CST
#########################################
import re
import urllib2
import BeautifulSoup
import string
from sgmllib import SGMLParser
import sys
NUM = 0 #全局 电影数量
m_type = u'' #电影类型
m_site = u'qq' #电影网站
def gethtml(url):
"""获取网页的代码"""
req = urllib2.Request(url)
response = urllib2.urlopen(req)
html = response.read()
return html
def out_Chinese(tags):
for tag in tags:
print str(tag)
def gettags(html):
global m_type
#print html
soup = BeautifulSoup.BeautifulSoup(html)
#print soup
tags_all = soup.findAll( 'div',{ 'class':'grid_18' } )
re_tags = r'<h6 class="scores">(.+?)</h6>'
re.UNICODE
p = re.compile(re_tags,re.DOTALL)
tags = p.findall(str(tags_all[0]))
# print tags_all
# print "++"*300
return tags
def buildMovieName(url):
print '<html><meta charset="utf-8"><head>Tecent Movie</head><body>'
out_Chinese(gettags(gethtml(url)))
print '</body></html>'
def getMovieName(tags):
tags = gettags(gethtml(tags))
movieNames = []
re_rules = r'<a href="(.+?)" title="(.+?)" target="(.*?)">'
match = re.compile(re_rules)
for tag in tags:
name = match.findall(tag)[0]
movieNames.append(name[1])#movie name
movieNames.append(name[0])#movie link
out_Chinese(movieNames)
return movieNames
#print gethtml(url)
#buildMovieName(url)
url = raw_input("Input the link of QQ movie\n")
if url is None or len(url) == 0:
url = 'http://v.qq.com/list/1_-1_-1_-1_2_0_0_20_0_-1_0.html'
print url
getMovieName(url)