我写了个多线程拉网页的小程序,就是从数据库中取出网页的URL列表,分别用10个线程去下载网页的HTML代码,也就是1,11,21...1001...交给线程1,2,12,22...1002...交给线程2,依次类推。
本来这个小程序在我某台LINUX机器上运行是没问题的,但是最近租了台新的机器,在上面一运行后发现了奇怪的问题:
    程序在开始的时候可以正常运行,在拉了大概100多,200不到个网页的时候,(也就是每个线程拉了10多个网页的时候),就出错了。错误信息如下(每个线程都是死于此错误)
java.net.ProtocolException: 0
   at gnu.java.net.protocol.http.Request.readResponse(libgcj.so.7rh)
   at gnu.java.net.protocol.http.Request.dispatch(libgcj.so.7rh)
   at gnu.java.net.protocol.http.HTTPURLConnection.connect(libgcj.so.7rh)
   at gnu.java.net.protocol.http.HTTPURLConnection.getInputStream(libgcj.so.7rh)
   at thread.run(thread.java:65)
我更换过好几个JDK版本,但是都没用,错误依旧,我就怀疑是不是系统里的设置的问题,但是我查了很多地方都找不出什么错误,望有高手能帮助我,谢谢!

解决方案 »

  1.   

    以下是我的源码:import java.io.*;
    import java.sql.*;
    import java.util.*;
    import java.net.*;public class thread extends Thread
    {
            String site_n;
            int s_at;
            int e_at;
            int for_id;
            thread(String site_n,int s_at,int e_at,int for_id){
                    this.site_n = site_n;    //site name
                    this.s_at = s_at;        //start
                    this.e_at = e_at;        //end
                    this.for_id = for_id;    //用于线程分配
            }
            public synchronized void run() {
                    String site = site_n;
                    int fornum = for_id;
                    int start = s_at;
                    int end = e_at;
                    //String site_name = "http://www."+site+".com";
                    ArrayList url_list = new ArrayList();
                    ArrayList url_id_list = new ArrayList();
                    try{
                            Class.forName("com.mysql.jdbc.Driver").newInstance();
                            String getconn = "jdbc:mysql://localhost/"+site+"?user=xxxx&password=xxxx";
                            Connection conn = DriverManager.getConnection(getconn);
                            PreparedStatement stmt = conn.prepareStatement("");
                            ResultSet rs = null;
                            stmt = conn.prepareStatement("select url,url_id from urls where url_id>=? and url_id<=?");
                            stmt.setInt(1,start);
                            stmt.setInt(2,end);
                            rs = stmt.executeQuery();
                            while(rs.next())
                            {
                                    url_id_list.add(rs.getString(2));
                                    url_list.add(rs.getString(1));
                            }
                            if(rs!=null)
                                    rs.close();
                            stmt.close();
                            conn.close();
                    }
                    catch(Exception e){
                            e.printStackTrace();
                    }
                    try{
                            InputStream in = null;
                            InputStreamReader rd = null;
                            BufferedReader br = null;
                            for(int i=fornum; i<url_list.size(); i=i+10){                                String save_dir = "./" + String.valueOf(Integer.parseInt(url_id_list.get(i).toString())/10000) + "/" ;
                                    try{
                                            if(!(new File(save_dir).isDirectory()))
                                                    new File(save_dir).mkdir();
                                    }
                                    catch(Exception exp){
                                            exp.printStackTrace();
                                    }
                                    String html = "";
                                    URL this_url = new URL(url_list.get(i).toString());
                    in = this_url.openConnection().getInputStream();        //这行就是报错的地方,但是此错误仅在我一台机器上出现,其余机器均无报错
                    rd = new InputStreamReader(in);
                    br=new BufferedReader(rd);
                    String line = br.readLine();
                    int img = 0 ;
                    while(line != null){
                        html += line + (char)13;
                        /////////////////
                        //  get image  //
                        /////////////////
                        if(line.indexOf("product image(s) bof")>0)
                        {
                            img = 1;
                        }
                        if(line.indexOf("product image(s) eof")>0)
                        {
                            img = 0;
                        }
                        if(img == 1 && line.indexOf("img src")>=0)
                        {
                            int bofimg = line.indexOf("images/");
                            int eofimg = line.indexOf("jpg\"")+3;
                            if(eofimg>bofimg){
                                            String imgURL = "http://www."+site+".com/" + line.substring(bofimg,eofimg);
                                            String imgdir = save_dir + url_id_list.get(i).toString() + ".jpg";
                                            getpic gp = new getpic();
                                            gp.crawlpic(imgURL,imgdir);
                            }
                        }                    line = br.readLine();
                    }                    br.close();
                        rd.close();
                        in.close();
                                    if(html==null)
                                            html = "";
                                    if(html.length()>0){
                                            String saveTo = save_dir+ url_id_list.get(i).toString() +".html";
                                            try {
                                                    new outPut(html, saveTo);
                                            } catch (IOException e) {
                                                    e.printStackTrace();
                                            }
                                            System.out.println((i+start) + ". Saved " + url_list.get(i) + " as " + (i+start) + ".html");
                                    }
                                    else
                                            System.out.println((i+start) + ". failed at " + url_list.get(i));
                            }
                            if(br!=null)
                                    br.close();
                    }catch (Exception e){
                            e.printStackTrace();
                    }
            }
    }
      

  2.   

    上面是thread.java,以下是crawl_what_i_want.java:public class crawl_what_i_want {
    public static void main(String[] args){
    if(args.length!=3){
    System.out.println("Usage: java crawl_html [site name] [start at] [end at]");
    }
    else
    {
    String v1 = args[0];
    int v2 = Integer.parseInt(args[1]);
    int v3 = Integer.parseInt(args[2]);

    thread thread0 = new thread(v1,v2,v3,0);
    thread thread1 = new thread(v1,v2,v3,1);
    thread thread2 = new thread(v1,v2,v3,2);
    thread thread3 = new thread(v1,v2,v3,3);
    thread thread4 = new thread(v1,v2,v3,4);
    thread thread5 = new thread(v1,v2,v3,5);
    thread thread6 = new thread(v1,v2,v3,6);
    thread thread7 = new thread(v1,v2,v3,7);
    thread thread8 = new thread(v1,v2,v3,8);
    thread thread9 = new thread(v1,v2,v3,9); thread0.start();
    thread1.start();
    thread2.start();
    thread3.start();
    thread4.start();
    thread5.start();
    thread6.start();
    thread7.start();
    thread8.start();
    thread9.start(); }
    }
    }另:报错信息就是这么点10个线程,10个同样的错误信息
      

  3.   

    运行"java crawl_what_i_want handhelditems 1 10000"后就这样:
    9. Saved http://www.handhelditems.com/camera-and-photo-c-4807.html as 9.html
    5. Saved http://www.handhelditems.com/cellular-c-3524.html as 5.html
    4. Saved http://www.handhelditems.com/giftideas.php as 4.html
    ...
    ...
    ...
    107. Saved http://www.handhelditems.com/create_account.php as 107.html
    115. Saved http://www.handhelditems.com/index.php as 115.html
    114. Saved http://www.handhelditems.com/discounts.php as 114.html
    101. Saved http://www.handhelditems.com/promotion.php as 101.html
    126. Saved http://www.handhelditems.com/motorola-motorola-razr-c-3524_4179_4829.html as 126.html
    java.net.ProtocolException: 0
       at gnu.java.net.protocol.http.Request.readResponse(libgcj.so.7rh)
       at gnu.java.net.protocol.http.Request.dispatch(libgcj.so.7rh)
       at gnu.java.net.protocol.http.HTTPURLConnection.connect(libgcj.so.7rh)
       at gnu.java.net.protocol.http.HTTPURLConnection.getInputStream(libgcj.so.7rh)
       at thread.run(thread.java:65)
    java.io.IOException: Connection reset by peer
       at java.io.BufferedInputStream.refill(libgcj.so.7rh)
       at java.io.BufferedInputStream.read(libgcj.so.7rh)
       at gnu.java.net.protocol.http.LimitedLengthInputStream.read(libgcj.so.7rh)
       at java.io.BufferedInputStream.refill(libgcj.so.7rh)
       at java.io.BufferedInputStream.read(libgcj.so.7rh)
       at getpic.crawlpic(getpic.java:14)
       at thread.run(thread.java:91)
    117. Saved http://www.handhelditems.com/rma_request.php as 117.html
    111. Saved http://www.handhelditems.com/security.php as 111.html
    119. Saved http://www.handhelditems.com/slingshot-animal-combo-monkey-chicken-duck-frog-p-5065.html?action=buy_now&sort=2a as 119.html
    java.net.ProtocolException: 0
       at gnu.java.net.protocol.http.Request.readResponse(libgcj.so.7rh)
       at gnu.java.net.protocol.http.Request.dispatch(libgcj.so.7rh)
       at gnu.java.net.protocol.http.HTTPURLConnection.connect(libgcj.so.7rh)
       at gnu.java.net.protocol.http.HTTPURLConnection.getInputStream(libgcj.so.7rh)
       at thread.run(thread.java:65)
    129. Saved http://www.handhelditems.com/cellular-universal-accessories-c-3524_5135.html as 129.html
    127. Saved http://www.handhelditems.com/cellular-garmin-ique-c-3524_5138.html as 127.html
    118. Saved http://www.handhelditems.com/slingshot-animal-combo-monkey-chicken-duck-frog-p-5065.html as 118.html
    128. Saved http://www.handhelditems.com/cellular-htc-c-3524_5132.html as 128.html
    136. Saved http://www.handhelditems.com/ipod-shuffle-generation-charger-green-p-7596.html as 136.html
    139. Saved http://www.handhelditems.com/ipod-shuffle-generation-charger-orange-p-7597.html as 139.html
    137. Saved http://www.handhelditems.com/ipod-shuffle-generation-charger-pink-p-7598.html as 137.html
    124. Saved http://www.handhelditems.com/radio-control-mini-hovercraft-2pcs-combo-p-7659.html as 124.html
    149. Saved http://www.handhelditems.com/ipod-1800mah-backup-battery-charger-adapter-p-5591.html?action=buy_now&sort=2a as 149.html
    java.net.ProtocolException: 0
       at gnu.java.net.protocol.http.Request.readResponse(libgcj.so.7rh)
       at gnu.java.net.protocol.http.Request.dispatch(libgcj.so.7rh)
       at gnu.java.net.protocol.http.HTTPURLConnection.connect(libgcj.so.7rh)
       at gnu.java.net.protocol.http.HTTPURLConnection.getInputStream(libgcj.so.7rh)
       at thread.run(thread.java:65)
    138. Saved http://www.handhelditems.com/ipod-shuffle-generation-charger-white-p-7232.html as 138.html
    134. Saved http://www.handhelditems.com/apple-ipod-shuffle-apple-ipod-shuffle-chargers-adapters-c-4_1861_1868.html as 134.html
    147. Saved http://www.handhelditems.com/ipod-home-theater-dock-p-7743.html?action=buy_now&sort=2a as 147.html
    146. Saved http://www.handhelditems.com/ipod-home-theater-dock-p-7743.html as 146.html
    144. Saved http://www.handhelditems.com/deep-bass-vibe-earphone-white-p-7545.html as 144.html
    157. Saved http://www.handhelditems.com/ipod-charger-adapter-black-p-4426.html as 157.html
    156. Saved http://www.handhelditems.com/index.php?cPath=4676&sort=2a&action=buy_now&products_id=4423 as 156.html
    167. Saved http://www.handhelditems.com/isolate-hifi-earphone-blue-p-5048.html as 167.html
    148. Saved http://www.handhelditems.com/ipod-1800mah-backup-battery-charger-adapter-p-5591.html as 148.html
    166. Saved http://www.handhelditems.com/ipod-video-soft-leather-wallet-black-p-7562.html?action=buy_now&sort=2a as 166.html
    154. Saved http://www.handhelditems.com/ipod-bubblegum-transmitter-green-p-5080.html as 154.html
    java.net.ProtocolException: 0
       at gnu.java.net.protocol.http.Request.readResponse(libgcj.so.7rh)
       at gnu.java.net.protocol.http.Request.dispatch(libgcj.so.7rh)
       at gnu.java.net.protocol.http.HTTPURLConnection.connect(libgcj.so.7rh)
       at gnu.java.net.protocol.http.HTTPURLConnection.getInputStream(libgcj.so.7rh)
       at thread.run(thread.java:65)
    158. Saved http://www.handhelditems.com/ipod-charger-adapter-black-p-4426.html?action=buy_now&sort=2a as 158.html
    java.net.ProtocolException: 0
       at gnu.java.net.protocol.http.Request.readResponse(libgcj.so.7rh)
       at gnu.java.net.protocol.http.Request.dispatch(libgcj.so.7rh)
       at gnu.java.net.protocol.http.HTTPURLConnection.connect(libgcj.so.7rh)
       at gnu.java.net.protocol.http.HTTPURLConnection.getInputStream(libgcj.so.7rh)
       at thread.run(thread.java:65)
    164. Saved http://www.handhelditems.com/ipod-nano-soft-leather-wallet-blue-p-7559.html?action=buy_now&sort=2a as 164.html
    java.net.ProtocolException: 0
       at gnu.java.net.protocol.http.Request.readResponse(libgcj.so.7rh)
       at gnu.java.net.protocol.http.Request.dispatch(libgcj.so.7rh)
       at gnu.java.net.protocol.http.HTTPURLConnection.connect(libgcj.so.7rh)
       at gnu.java.net.protocol.http.HTTPURLConnection.getInputStream(libgcj.so.7rh)
       at thread.run(thread.java:65)
    刚发现,其中还有这么段错误:
    java.io.IOException: Connection reset by peer
       at java.io.BufferedInputStream.refill(libgcj.so.7rh)
       at java.io.BufferedInputStream.read(libgcj.so.7rh)
       at gnu.java.net.protocol.http.LimitedLengthInputStream.read(libgcj.so.7rh)
       at java.io.BufferedInputStream.refill(libgcj.so.7rh)
       at java.io.BufferedInputStream.read(libgcj.so.7rh)
       at getpic.crawlpic(getpic.java:14)
       at thread.run(thread.java:91)