以下代码:import java.io.BufferedReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;public class Application {
public static void main(String[] args) throws IOException {
final URL url = new URL("http://www.amazon.cn/s?rh=n:663227051");
final String agentString = "Mozilla/5.0 (Windows; U; Windows NT 6.1; zh-CN; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)";
URLConnection urlConnection = url.openConnection();
urlConnection.setRequestProperty("User-Agent", agentString); InputStreamReader streamReader = new InputStreamReader(urlConnection.getInputStream());
BufferedReader reader = new BufferedReader(streamReader); final String path = "Test.html";
FileWriter writer = new FileWriter(path);
writer.write("");
String line;
while ((line = reader.readLine()) != null)
writer.append(line);
}
}
执行多次后发现,扒到的页面有时是“显示所有 5 个结果”,有时是“显示: 1-12条, 共29条”,而在浏览器里无论刷新多少次结果都是后者。为什么会这样呢?怎样使得扒取的页面与浏览器获得的内容始终保持一致?
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;public class Application {
public static void main(String[] args) throws IOException {
final URL url = new URL("http://www.amazon.cn/s?rh=n:663227051");
final String agentString = "Mozilla/5.0 (Windows; U; Windows NT 6.1; zh-CN; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)";
URLConnection urlConnection = url.openConnection();
urlConnection.setRequestProperty("User-Agent", agentString); InputStreamReader streamReader = new InputStreamReader(urlConnection.getInputStream());
BufferedReader reader = new BufferedReader(streamReader); final String path = "Test.html";
FileWriter writer = new FileWriter(path);
writer.write("");
String line;
while ((line = reader.readLine()) != null)
writer.append(line);
}
}
执行多次后发现,扒到的页面有时是“显示所有 5 个结果”,有时是“显示: 1-12条, 共29条”,而在浏览器里无论刷新多少次结果都是后者。为什么会这样呢?怎样使得扒取的页面与浏览器获得的内容始终保持一致?
解决方案 »
- arraylist转化成二维数组的问题
- 用Java POI读取Excel文件中的数据,并且实现对Excel文件中的数据的正确性验证
- 关于正则表达式匹配空格的问题
- 怎样获取TextArea光标所在的坐标?
- 一道程序题 为什么doIt输出的是B
- 请问一个关于图片编码传输的问题(高分求解)
- 请哪位大虾帮忙看看这个url错误是怎么回事,着急啊
- JAVA菜鸟求教各位高手关于APPLET的一个问题,多谢了
- 类继承问题。。。。
- 求助!!哪位能帮小弟解决一下关于java和短信的问题(分我给)
- 这个JAVA代码老是在运行前有各种NULL POINTER EXCEPTION,求高手帮忙!!
- 那位大虾有推荐代码??
import java.net.*;public class Test {
public static void main(String[] args){
try{
String line = String.format("%s=%s",URLEncoder.encode("rh","UTF-8"),URLEncoder.encode("n:663227051","UTF-8"));
final URL url = new URL("http://www.amazon.cn/s");
URLConnection urlConnection = url.openConnection();
urlConnection.setDoOutput(true);
OutputStreamWriter osw = new OutputStreamWriter(urlConnection.getOutputStream());
osw.write(line);
osw.flush();
osw.close();
BufferedReader reader = new BufferedReader(new InputStreamReader(urlConnection.getInputStream())); final String path = "Test.html";
PrintWriter writer = new PrintWriter(new FileWriter(path));
String data;
while ((data = reader.readLine()) != null)
writer.println(data);
writer.close();
reader.close();}
catch(MalformedURLException e){}
catch(IOException e){}
}
}
我照着http://www.exampledepot.com/egs/java.net/Post.html写的。
2009-12-09 07:14 1,661 Test.class
2009-12-09 07:14 124,857 Test.html
2009-12-09 07:14 877 Test.java
试运行了,得到一个五个搜索结果的页面。
urlConnection.setInstanceFollowRedirects(false);禁止重定向后,输出的Test.html大小为0。
package test;import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;public class TestURLReader {
/**
* @param args
*/
public static void main(String[] args) {
BufferedWriter bw = null;
try {
for (int i = 0; i < 20; i++) {
File f = new File("F:/test" + i + ".html");
URL url = new URL(
"http://www.amazon.cn/s/qid=1260384036/ref=sr_nr_n_0?ie=UTF8&rs=659402051&bbn=659402051&rnid=659402051&rh=n%3A658390051%2Cn%3A!658391051%2Cn%3A658394051%2Cn%3A658514051%2Cn%3A659402051%2Cn%3A663276051");
BufferedReader buf = new BufferedReader(new InputStreamReader(
url.openStream(), "UTF-8"));
Thread.sleep(2000);
System.out.println("sleep ok!");
String str;
String all = "";
bw = new BufferedWriter(new FileWriter(f, false)); while ((str = buf.readLine()) != null) {
all += str;
}
bw.write(all.toString());
bw.close();
}
} catch (Exception e1) {
e1.printStackTrace();
} finally {
try {
bw.close();
} catch (IOException e) {
e.printStackTrace();
}
System.exit(0);
}
}
}
但是用你的代码抓页面:http://www.amazon.cn/s?rh=n:663227051 ,也只能抓到4条。
try{
final HtmlPage page = client.getPage("http://www.amazon.cn/s?rh=n:663227051");
final HtmlDivision resultCount = page.getHtmlElementById("resultCount");
System.out.println(resultCount.getTextContent());
//page.save(new File("amazoncn.html"));
//FileWriter writer = new FileWriter(new File("amazoncn.html"));
//writer.write(page.getWebResponse().getContentAsString());
//writer.close();
}
catch(IOException e){
e.printStackTrace();
}WebClient,HtmlPage,HtmlDivision来自htmlunit2.6。
大部分情况是
[java] 显示: 1-12条, 共29条
偶尔
[java] 显示所有 4 个结果
是啊,所以我说是这个url链接得问题啊。可能我链接打成连接让你误会了,呵呵