抓取百度或者其他网站数据没问题,可是谷歌的总是报错:java.net.SocketException: Connection reset
抓取代码如下:
HttpURLConnection connection = null;
BufferedInputStream in = null;
BufferedReader read = null;
try {
URL url = new URL(strURL);
String cookie = "";
int iii=0;
do {
iii++;
connection = (HttpURLConnection)url.openConnection();
System.out.println("11111111:"+strURL);
Thread.sleep(5000);
if(cookie.length() != 0) {
connection.setRequestProperty("Cookie", cookie);
}
connection.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 8.0)");
connection.setInstanceFollowRedirects(false);
System.out.println("9999999999"+connection.getHeaderField("Set-Cookie"));
int code = connection.getResponseCode(); if(code == HttpURLConnection.HTTP_MOVED_TEMP) {
cookie += connection.getHeaderField("Set-Cookie") + ";";
} if((connection.getResponseCode() == HttpURLConnection.HTTP_OK) || iii==4)
break;
}
while(true); in = new BufferedInputStream(connection.getInputStream());
read = new BufferedReader(new InputStreamReader(in,"GB2312"));
抓取代码如下:
HttpURLConnection connection = null;
BufferedInputStream in = null;
BufferedReader read = null;
try {
URL url = new URL(strURL);
String cookie = "";
int iii=0;
do {
iii++;
connection = (HttpURLConnection)url.openConnection();
System.out.println("11111111:"+strURL);
Thread.sleep(5000);
if(cookie.length() != 0) {
connection.setRequestProperty("Cookie", cookie);
}
connection.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 8.0)");
connection.setInstanceFollowRedirects(false);
System.out.println("9999999999"+connection.getHeaderField("Set-Cookie"));
int code = connection.getResponseCode(); if(code == HttpURLConnection.HTTP_MOVED_TEMP) {
cookie += connection.getHeaderField("Set-Cookie") + ";";
} if((connection.getResponseCode() == HttpURLConnection.HTTP_OK) || iii==4)
break;
}
while(true); in = new BufferedInputStream(connection.getInputStream());
read = new BufferedReader(new InputStreamReader(in,"GB2312"));
部分可以争取打印出来
System.out.println("9999999999"+connection.getHeaderField("Set-Cookie"));
部分打印结果为:9999999999null
其中关键词,已经转码了
能把你的代码给我看看吗,或者看看我的有什么问题?我没设置代理服务器。我的代码抓别的网站没问题,就是google不行呢。
修改下面这句或者不要下面这句:
connection.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 8.0)");
把Mozilla/4.0 (compatible; MSIE 8.0)修改成其它串