新手求助 如何从一个网站页面按一定要求抓取出自己想要的数据? 比如http://www.appannie.com/top/iphone/united-states/games/这个页面,我想把FREE一栏的游戏排位上升大于30的游戏名称都抓取出来,该怎么办呢?貌似可以用jsoup,不过我找了很多例子,看不太懂 解决方案 » 免费领取超大流量手机卡,每月29元包185G流量+100分钟通话, 中国电信官方发货 jsoup这种连接即断开的方式很容易被认为是网络攻击,所以会报503错误,像LZ说的这个网站就不能用jsoup直接抓取,不过可以先用HttpClient将网页保存到本地,然后再用jsoup来分析//先保存到本地硬盘 HttpClient client = new HttpClient(); String htmlurl = "http://www.appannie.com/top/iphone/united-states/games/"; System.out.println(htmlurl); HttpMethod method = new GetMethod(htmlurl); try { client.executeMethod(method); System.out.println(method.getStatusLine()); String html = method.getResponseBodyAsString(); FileWriter fw = new FileWriter("C:\\download\\Top Charts - iPhone - United States - Games App Annie.htm" ); fw.write(html); fw.close(); } catch (HttpException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); }try { //URL url = new URL("http://www.appannie.com/top/iphone/united-states/games/"); //Document doc = Jsoup.parse(url, 3000000); File f = new File("C:\\download\\Top Charts - iPhone - United States - Games App Annie.htm"); Document doc = Jsoup.parse(f,"UTF-8"); Elements tables = doc.select("table"); Element table = tables.get(1); Elements trs =table.getElementsByTag("tr"); for(Element tr: trs) { Elements tds = tr.children(); Element td = tds.get(2);//表示 Free那一列 Elements span =td.getElementsByTag("span"); String content = span.get(0).html(); if(content.contains("\u25b2")) { String up = content.replace("\u25b2", "");//正三角,倒三角是\u25bc int upnum = Integer.parseInt(up); if(upnum >=30) { Elements a = td.getElementsByTag("a"); System.out.println(a.get(0).html()); } } } } catch (MalformedURLException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } 如果我想把数据存到本地txt文档而不是直接输出呢?bw吗 文本域按行读取的问题 java中转换问题 JAVA送分题 请问让用户选择日期 用什么SWING组件? 高难度的问题 java.sql.sqlexception :JZOr2:该查询无结果集 请问System.setProperty() NetBeans IDE 3.5.1里的classpath的问题.谢谢!!!!! 如何使微软的IIS支持服务器端的JSP Servlet? 怎么编译不通过呢? 文件流有没有头指针? 佛洛依德算法问题 新手请教ByteBuffer java robot类的实现按住按键不放!
HttpClient client = new HttpClient();
String htmlurl = "http://www.appannie.com/top/iphone/united-states/games/";
System.out.println(htmlurl);
HttpMethod method = new GetMethod(htmlurl);
try
{
client.executeMethod(method);
System.out.println(method.getStatusLine());
String html = method.getResponseBodyAsString();
FileWriter fw = new FileWriter("C:\\download\\Top Charts - iPhone - United States - Games App Annie.htm" );
fw.write(html);
fw.close();
} catch (HttpException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
try
{
//URL url = new URL("http://www.appannie.com/top/iphone/united-states/games/");
//Document doc = Jsoup.parse(url, 3000000);
File f = new File("C:\\download\\Top Charts - iPhone - United States - Games App Annie.htm");
Document doc = Jsoup.parse(f,"UTF-8");
Elements tables = doc.select("table");
Element table = tables.get(1);
Elements trs =table.getElementsByTag("tr");
for(Element tr: trs)
{
Elements tds = tr.children();
Element td = tds.get(2);//表示 Free那一列
Elements span =td.getElementsByTag("span");
String content = span.get(0).html();
if(content.contains("\u25b2"))
{
String up = content.replace("\u25b2", "");//正三角,倒三角是\u25bc
int upnum = Integer.parseInt(up);
if(upnum >=30)
{
Elements a = td.getElementsByTag("a");
System.out.println(a.get(0).html());
}
}
}
} catch (MalformedURLException e)
{
e.printStackTrace();
} catch (IOException e)
{
e.printStackTrace();
}