第4关:这一关才开始推荐使用urllib
,无妨。打开源码发现注释:
urllib may help. DON’T TRY ALL NOTHINGS, since it will never end. 400 times is more than enough.
点击图片发现可以跳转,url=http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345
,替换12345
为页面中next nothing的值:44827
,然后多替换几次,就可以发现规律,一点点消除URL的异常。为了节省时间,每次开始可以从抛异常的上一个数开始执行,只需要替换url中的nothing值即可。
第一次:Yes. Divide by two and keep going.
,把上次值除以2.
第二次:There maybe misleading numbers in the text. One example is 82683. Look only for the next nothing and the next nothing is 63579
第三次:peak.html
,这就是下一关的URL。
- 获取源码
import urllib.request
def get_html_page(url):
page = None
resp = urllib.request.urlopen(url)
if (resp.status == 200):
page = resp.read().decode('utf-8')
return page
- 获取注释
def get_comments(page):
rs = re.findall('<!--\s*(.*?)\s*-->', page, re.S)
print(rs)
if rs:
return rs[0] # 注意这里是0
return None
- 主函数
def main():
url = 'http://www.pythonchallenge.com/pc/def/equality.html'
page = get_html_page(url)
comments = get_comments(page)
print(comments)
rs = re.findall('[^A-Z][A-Z]{3}([a-z])[A-Z]{3}[^A-Z]', comments, re.S)
print("".join(rs))
下一关URL:http://www.pythonchallenge.com/pc/def/peak.html