0
点赞
收藏
分享

微信扫一扫

爬虫维基百科

哈哈镜6567 2022-08-02 阅读 77

#__author__ = 'DouYunQian'

#coding=utf-8

import re

from bs4 import BeautifulSoup

from urllib import request

rep=request.urlopen("https://en.wikipedia.org/wiki/Main_Page").read().decode("utf-8")





soup=BeautifulSoup(rep,"html.parser")



for line in soup.find_all("a",href=re.compile("^/wiki/")):

if re.search("\.(jpg|JPG)$",line["href"]):

continue

print(line.get_text(),"<---->","https://en.wikipedia.org"+line["href"])

举报

相关推荐

0 条评论