我写了个多线程拉网页的小程序,就是从数据库中取出网页的URL列表,分别用10个线程去下载网页的HTML代码,也就是1,11,21...1001...交给线程1,2,12,22...1002...交给线程2,依次类推。
本来这个小程序在我某台LINUX机器上运行是没问题的,但是最近租了台新的机器,在上面一运行后发现了奇怪的问题:
程序在开始的时候可以正常运行,在拉了大概100多,200不到个网页的时候,(也就是每个线程拉了10多个网页的时候),就出错了。错误信息如下(每个线程都是死于此错误)
java.net.ProtocolException: 0
at gnu.java.net.protocol.http.Request.readResponse(libgcj.so.7rh)
at gnu.java.net.protocol.http.Request.dispatch(libgcj.so.7rh)
at gnu.java.net.protocol.http.HTTPURLConnection.connect(libgcj.so.7rh)
at gnu.java.net.protocol.http.HTTPURLConnection.getInputStream(libgcj.so.7rh)
at thread.run(thread.java:65)
我更换过好几个JDK版本,但是都没用,错误依旧,我就怀疑是不是系统里的设置的问题,但是我查了很多地方都找不出什么错误,望有高手能帮助我,谢谢!
本来这个小程序在我某台LINUX机器上运行是没问题的,但是最近租了台新的机器,在上面一运行后发现了奇怪的问题:
程序在开始的时候可以正常运行,在拉了大概100多,200不到个网页的时候,(也就是每个线程拉了10多个网页的时候),就出错了。错误信息如下(每个线程都是死于此错误)
java.net.ProtocolException: 0
at gnu.java.net.protocol.http.Request.readResponse(libgcj.so.7rh)
at gnu.java.net.protocol.http.Request.dispatch(libgcj.so.7rh)
at gnu.java.net.protocol.http.HTTPURLConnection.connect(libgcj.so.7rh)
at gnu.java.net.protocol.http.HTTPURLConnection.getInputStream(libgcj.so.7rh)
at thread.run(thread.java:65)
我更换过好几个JDK版本,但是都没用,错误依旧,我就怀疑是不是系统里的设置的问题,但是我查了很多地方都找不出什么错误,望有高手能帮助我,谢谢!
import java.sql.*;
import java.util.*;
import java.net.*;public class thread extends Thread
{
String site_n;
int s_at;
int e_at;
int for_id;
thread(String site_n,int s_at,int e_at,int for_id){
this.site_n = site_n; //site name
this.s_at = s_at; //start
this.e_at = e_at; //end
this.for_id = for_id; //用于线程分配
}
public synchronized void run() {
String site = site_n;
int fornum = for_id;
int start = s_at;
int end = e_at;
//String site_name = "http://www."+site+".com";
ArrayList url_list = new ArrayList();
ArrayList url_id_list = new ArrayList();
try{
Class.forName("com.mysql.jdbc.Driver").newInstance();
String getconn = "jdbc:mysql://localhost/"+site+"?user=xxxx&password=xxxx";
Connection conn = DriverManager.getConnection(getconn);
PreparedStatement stmt = conn.prepareStatement("");
ResultSet rs = null;
stmt = conn.prepareStatement("select url,url_id from urls where url_id>=? and url_id<=?");
stmt.setInt(1,start);
stmt.setInt(2,end);
rs = stmt.executeQuery();
while(rs.next())
{
url_id_list.add(rs.getString(2));
url_list.add(rs.getString(1));
}
if(rs!=null)
rs.close();
stmt.close();
conn.close();
}
catch(Exception e){
e.printStackTrace();
}
try{
InputStream in = null;
InputStreamReader rd = null;
BufferedReader br = null;
for(int i=fornum; i<url_list.size(); i=i+10){ String save_dir = "./" + String.valueOf(Integer.parseInt(url_id_list.get(i).toString())/10000) + "/" ;
try{
if(!(new File(save_dir).isDirectory()))
new File(save_dir).mkdir();
}
catch(Exception exp){
exp.printStackTrace();
}
String html = "";
URL this_url = new URL(url_list.get(i).toString());
in = this_url.openConnection().getInputStream(); //这行就是报错的地方,但是此错误仅在我一台机器上出现,其余机器均无报错
rd = new InputStreamReader(in);
br=new BufferedReader(rd);
String line = br.readLine();
int img = 0 ;
while(line != null){
html += line + (char)13;
/////////////////
// get image //
/////////////////
if(line.indexOf("product image(s) bof")>0)
{
img = 1;
}
if(line.indexOf("product image(s) eof")>0)
{
img = 0;
}
if(img == 1 && line.indexOf("img src")>=0)
{
int bofimg = line.indexOf("images/");
int eofimg = line.indexOf("jpg\"")+3;
if(eofimg>bofimg){
String imgURL = "http://www."+site+".com/" + line.substring(bofimg,eofimg);
String imgdir = save_dir + url_id_list.get(i).toString() + ".jpg";
getpic gp = new getpic();
gp.crawlpic(imgURL,imgdir);
}
} line = br.readLine();
} br.close();
rd.close();
in.close();
if(html==null)
html = "";
if(html.length()>0){
String saveTo = save_dir+ url_id_list.get(i).toString() +".html";
try {
new outPut(html, saveTo);
} catch (IOException e) {
e.printStackTrace();
}
System.out.println((i+start) + ". Saved " + url_list.get(i) + " as " + (i+start) + ".html");
}
else
System.out.println((i+start) + ". failed at " + url_list.get(i));
}
if(br!=null)
br.close();
}catch (Exception e){
e.printStackTrace();
}
}
}
public static void main(String[] args){
if(args.length!=3){
System.out.println("Usage: java crawl_html [site name] [start at] [end at]");
}
else
{
String v1 = args[0];
int v2 = Integer.parseInt(args[1]);
int v3 = Integer.parseInt(args[2]);
thread thread0 = new thread(v1,v2,v3,0);
thread thread1 = new thread(v1,v2,v3,1);
thread thread2 = new thread(v1,v2,v3,2);
thread thread3 = new thread(v1,v2,v3,3);
thread thread4 = new thread(v1,v2,v3,4);
thread thread5 = new thread(v1,v2,v3,5);
thread thread6 = new thread(v1,v2,v3,6);
thread thread7 = new thread(v1,v2,v3,7);
thread thread8 = new thread(v1,v2,v3,8);
thread thread9 = new thread(v1,v2,v3,9); thread0.start();
thread1.start();
thread2.start();
thread3.start();
thread4.start();
thread5.start();
thread6.start();
thread7.start();
thread8.start();
thread9.start(); }
}
}另:报错信息就是这么点10个线程,10个同样的错误信息
9. Saved http://www.handhelditems.com/camera-and-photo-c-4807.html as 9.html
5. Saved http://www.handhelditems.com/cellular-c-3524.html as 5.html
4. Saved http://www.handhelditems.com/giftideas.php as 4.html
...
...
...
107. Saved http://www.handhelditems.com/create_account.php as 107.html
115. Saved http://www.handhelditems.com/index.php as 115.html
114. Saved http://www.handhelditems.com/discounts.php as 114.html
101. Saved http://www.handhelditems.com/promotion.php as 101.html
126. Saved http://www.handhelditems.com/motorola-motorola-razr-c-3524_4179_4829.html as 126.html
java.net.ProtocolException: 0
at gnu.java.net.protocol.http.Request.readResponse(libgcj.so.7rh)
at gnu.java.net.protocol.http.Request.dispatch(libgcj.so.7rh)
at gnu.java.net.protocol.http.HTTPURLConnection.connect(libgcj.so.7rh)
at gnu.java.net.protocol.http.HTTPURLConnection.getInputStream(libgcj.so.7rh)
at thread.run(thread.java:65)
java.io.IOException: Connection reset by peer
at java.io.BufferedInputStream.refill(libgcj.so.7rh)
at java.io.BufferedInputStream.read(libgcj.so.7rh)
at gnu.java.net.protocol.http.LimitedLengthInputStream.read(libgcj.so.7rh)
at java.io.BufferedInputStream.refill(libgcj.so.7rh)
at java.io.BufferedInputStream.read(libgcj.so.7rh)
at getpic.crawlpic(getpic.java:14)
at thread.run(thread.java:91)
117. Saved http://www.handhelditems.com/rma_request.php as 117.html
111. Saved http://www.handhelditems.com/security.php as 111.html
119. Saved http://www.handhelditems.com/slingshot-animal-combo-monkey-chicken-duck-frog-p-5065.html?action=buy_now&sort=2a as 119.html
java.net.ProtocolException: 0
at gnu.java.net.protocol.http.Request.readResponse(libgcj.so.7rh)
at gnu.java.net.protocol.http.Request.dispatch(libgcj.so.7rh)
at gnu.java.net.protocol.http.HTTPURLConnection.connect(libgcj.so.7rh)
at gnu.java.net.protocol.http.HTTPURLConnection.getInputStream(libgcj.so.7rh)
at thread.run(thread.java:65)
129. Saved http://www.handhelditems.com/cellular-universal-accessories-c-3524_5135.html as 129.html
127. Saved http://www.handhelditems.com/cellular-garmin-ique-c-3524_5138.html as 127.html
118. Saved http://www.handhelditems.com/slingshot-animal-combo-monkey-chicken-duck-frog-p-5065.html as 118.html
128. Saved http://www.handhelditems.com/cellular-htc-c-3524_5132.html as 128.html
136. Saved http://www.handhelditems.com/ipod-shuffle-generation-charger-green-p-7596.html as 136.html
139. Saved http://www.handhelditems.com/ipod-shuffle-generation-charger-orange-p-7597.html as 139.html
137. Saved http://www.handhelditems.com/ipod-shuffle-generation-charger-pink-p-7598.html as 137.html
124. Saved http://www.handhelditems.com/radio-control-mini-hovercraft-2pcs-combo-p-7659.html as 124.html
149. Saved http://www.handhelditems.com/ipod-1800mah-backup-battery-charger-adapter-p-5591.html?action=buy_now&sort=2a as 149.html
java.net.ProtocolException: 0
at gnu.java.net.protocol.http.Request.readResponse(libgcj.so.7rh)
at gnu.java.net.protocol.http.Request.dispatch(libgcj.so.7rh)
at gnu.java.net.protocol.http.HTTPURLConnection.connect(libgcj.so.7rh)
at gnu.java.net.protocol.http.HTTPURLConnection.getInputStream(libgcj.so.7rh)
at thread.run(thread.java:65)
138. Saved http://www.handhelditems.com/ipod-shuffle-generation-charger-white-p-7232.html as 138.html
134. Saved http://www.handhelditems.com/apple-ipod-shuffle-apple-ipod-shuffle-chargers-adapters-c-4_1861_1868.html as 134.html
147. Saved http://www.handhelditems.com/ipod-home-theater-dock-p-7743.html?action=buy_now&sort=2a as 147.html
146. Saved http://www.handhelditems.com/ipod-home-theater-dock-p-7743.html as 146.html
144. Saved http://www.handhelditems.com/deep-bass-vibe-earphone-white-p-7545.html as 144.html
157. Saved http://www.handhelditems.com/ipod-charger-adapter-black-p-4426.html as 157.html
156. Saved http://www.handhelditems.com/index.php?cPath=4676&sort=2a&action=buy_now&products_id=4423 as 156.html
167. Saved http://www.handhelditems.com/isolate-hifi-earphone-blue-p-5048.html as 167.html
148. Saved http://www.handhelditems.com/ipod-1800mah-backup-battery-charger-adapter-p-5591.html as 148.html
166. Saved http://www.handhelditems.com/ipod-video-soft-leather-wallet-black-p-7562.html?action=buy_now&sort=2a as 166.html
154. Saved http://www.handhelditems.com/ipod-bubblegum-transmitter-green-p-5080.html as 154.html
java.net.ProtocolException: 0
at gnu.java.net.protocol.http.Request.readResponse(libgcj.so.7rh)
at gnu.java.net.protocol.http.Request.dispatch(libgcj.so.7rh)
at gnu.java.net.protocol.http.HTTPURLConnection.connect(libgcj.so.7rh)
at gnu.java.net.protocol.http.HTTPURLConnection.getInputStream(libgcj.so.7rh)
at thread.run(thread.java:65)
158. Saved http://www.handhelditems.com/ipod-charger-adapter-black-p-4426.html?action=buy_now&sort=2a as 158.html
java.net.ProtocolException: 0
at gnu.java.net.protocol.http.Request.readResponse(libgcj.so.7rh)
at gnu.java.net.protocol.http.Request.dispatch(libgcj.so.7rh)
at gnu.java.net.protocol.http.HTTPURLConnection.connect(libgcj.so.7rh)
at gnu.java.net.protocol.http.HTTPURLConnection.getInputStream(libgcj.so.7rh)
at thread.run(thread.java:65)
164. Saved http://www.handhelditems.com/ipod-nano-soft-leather-wallet-blue-p-7559.html?action=buy_now&sort=2a as 164.html
java.net.ProtocolException: 0
at gnu.java.net.protocol.http.Request.readResponse(libgcj.so.7rh)
at gnu.java.net.protocol.http.Request.dispatch(libgcj.so.7rh)
at gnu.java.net.protocol.http.HTTPURLConnection.connect(libgcj.so.7rh)
at gnu.java.net.protocol.http.HTTPURLConnection.getInputStream(libgcj.so.7rh)
at thread.run(thread.java:65)
刚发现,其中还有这么段错误:
java.io.IOException: Connection reset by peer
at java.io.BufferedInputStream.refill(libgcj.so.7rh)
at java.io.BufferedInputStream.read(libgcj.so.7rh)
at gnu.java.net.protocol.http.LimitedLengthInputStream.read(libgcj.so.7rh)
at java.io.BufferedInputStream.refill(libgcj.so.7rh)
at java.io.BufferedInputStream.read(libgcj.so.7rh)
at getpic.crawlpic(getpic.java:14)
at thread.run(thread.java:91)