0
点赞
收藏
分享

微信扫一扫

Java实现网络爬虫入门Demo

左手梦圆 2022-08-04 阅读 52


需求:

抓取一个网页(比如www.lianhehuishang.com)中的url地址,并存到F:\spider_url.txt文件中。


程序:

package com.zheng;


import java.io.BufferedReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class WebSpider {
public static void main(String[] args) {
URL url = null;
URLConnection urlconn = null;
BufferedReader br = null;
PrintWriter pw = null;
String regex = "http://[\\w+\\.?/?]+\\.[A-Za-z]+";
Pattern p = Pattern.compile(regex);
try {
url = new URL("http://www.lianhehuishang.com/");
urlconn = url.openConnection();
pw = new PrintWriter(new FileWriter("f:\\spider_url.txt"), true);
br = new BufferedReader(new InputStreamReader(
urlconn.getInputStream()));
String buf = null;
while ((buf = br.readLine()) != null) {
Matcher buf_m = p.matcher(buf);
while (buf_m.find()) {
pw.println(buf_m.group());
}
}
System.out.println("获取成功!");
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
pw.close();
}
}
}


运行结果:

Java实现网络爬虫入门Demo_java

                                         

打开F:\spider_url.txt


Java实现网络爬虫入门Demo_.net_02


举报

相关推荐

0 条评论