JAVA读取网页内容串行的问题

首先代码如下：
Spider类public class Spider implements Runnable{

    HttpURLConnection huc;
    InputStream is;
    BufferedReader reader;
    String url;

    public Spider(String str){
        try {
            url=str;
        } catch (Exception e) {
            e.printStackTrace();
        }
        try {
            huc=(HttpURLConnection)new URL(url).openConnection();
        } catch (MalformedURLException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }

        new Thread(this).start();
    }    public void run() {
        try {
            huc.setRequestMethod("GET");
            huc.setRequestProperty("user-agent","mozilla/4.0 (compatible; msie 6.0; windows 2000)");
        } catch (ProtocolException e) {
            e.printStackTrace();
        }
        try {
            huc.setUseCaches(true);
            huc.connect();

        } catch (IOException e) {
            e.printStackTrace();
        }
        try {
            is=huc.getInputStream();
            reader=new BufferedReader(new InputStreamReader(is,huc.getContentType().equals("text-html; charset=gb2312")?"gb2312":"UTF-8"));
            String str;
            System.out.flush();
            while((str=reader.readLine())!=null){
                System.out.println(str);
                System.out.flush();
            }
        } catch (IOException e) {
            e.printStackTrace();
        }finally{
            try {
                reader.close();
                is.close();
                huc.disconnect();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        return;
    }
}
结果在输出的结果与右键点浏览器查看源码的结果比较发现：
输出的结果如下：
<html>
        <title>XXXXXXXXXXXXX</title>
       </head>
  <body   >  <head>原本应该在第二行的<head>跑到了后面，发生了串行的现象，不只为何，而且串行的位置也不固定，也就是说每次运行得到的结果都不一致，求高人解答

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

为方便阅读把缩减版代码贴下：
public class Spider implements Runnable{

    HttpURLConnection huc;
    InputStream is;
    BufferedReader reader;
    String url;

    public Spider(String str){
        url=str;
        huc=(HttpURLConnection)new URL(url).openConnection();
        new Thread(this).start();
    }     public void run() {
            huc.setRequestMethod("GET");
            huc.setRequestProperty("user-agent","mozilla/4.0 (compatible; msie 6.0; windows 2000)");              huc.setUseCaches(true);
            huc.connect();

            is=huc.getInputStream();
            reader=new BufferedReader(new InputStreamReader(is,huc.getContentType().equals("text-html; charset=gb2312")?"gb2312":"UTF-8"));
            String str;
            System.out.flush();
            while((str=reader.readLine())!=null){
                System.out.println(str);
                System.out.flush();
            }

                reader.close();
                is.close();
                huc.disconnect();

        return;
    }
}
每次右键点浏览器查看源码的结果都一样吗?另外是只有<head>串行,还是其他行也串啊?
右键查看源码的话每次都一样
其他行也串，不多；也不一定是<head>，感觉像随机的，也许是我没找到规律，串的位置也像是随机的
set一下值然后在get一下值不是可以取到页面内容了吗???
  System.out.flush();
把这个去掉,
也不要这样读,str=reader.readLine())!=null
页面是解析出来的,你这样读不对,按字节读,
另外我用的是英文正版XP，为了读取中文字符，在Eclipse中右键点工程选Properties -> Text file encoding 中选择了UTF-8，不知道这会不会有影响？
byte [] buffer=new byte[1024];
int read = 0;
while ((read = is.read(buffer)) != -1) {
.............................
}
is.close();
is = null;
你是想访问一个网站然后把访问的这个页面的源代码获得到是吗？如果是的话试试这个：public static String cc(String leibie, String num) {
        StringBuffer temp = new StringBuffer();
        try {
            System.out.println(leibie);
            System.out.println(num);
            String url = "http://www.yb983.com/jiaojing/ser.php";
            HttpURLConnection uc = (HttpURLConnection)new URL(url).
                                   openConnection();
            uc.setConnectTimeout(10000);
            uc.setDoOutput(true);
            uc.setRequestMethod("GET");
            uc.setUseCaches(false);
            DataOutputStream out = new DataOutputStream(uc.getOutputStream());            // 要传的参数
            String s = URLEncoder.encode("ra", "GB2312") + "=" +
                       URLEncoder.encode(leibie, "GB2312");
            s += "&" + URLEncoder.encode("keyword", "GB2312") + "=" +
                    URLEncoder.encode(num, "GB2312");
            // DataOutputStream.writeBytes将字符串中的16位的unicode字符以8位的字符形式写道流里面
            out.writeBytes(s);
            out.flush();
            out.close();
            InputStream in = new BufferedInputStream(uc.getInputStream());
            Reader rd = new InputStreamReader(in, "Gb2312");
            int c = 0;
            while ((c = rd.read()) != -1) {
                temp.append((char) c);
            }
            System.out.println(temp.toString());
            in.close();        } catch (Exception e) {
            e.printStackTrace();
        }
        return temp.toString();
    }public static void main(String[] a){
        test.cc("1","吉H");
    }
复制粘贴可以运行看下控制台输出的效果把URL换成你要抓取的网页的地址传入对应的参数可以用POST或GET方法。不知道你要的是这个东西不
while((str=reader.readLine())!=null){
                System.out.println(str);
                System.out.flush();
            }
(str=reader.readLine())!=null
读取网页数据时,页面上有时会有很大的空白,但不是空,不知道这里会不会有影响
我也觉得可能会有影响，有时候偶尔会跳出异常指向这一行，像这样：
java.io.IOException: Stream closed
at java.io.BufferedReader.ensureOpen(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at Spider.run(Spider.java:44)
at java.lang.Thread.run(Unknown Source)
发现我之前用的代码也和你一样的,能不能把地址发一下给我,我用我的试一下:
/**
* 处理页面，得到页面的源码
* @param tempurl
* @return - 页面内容
*/
public static String getHtml(String tempurl, String code) { try {
URL url = new URL(tempurl);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.connect();
InputStream is = conn.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is, code));
String line = "";
StringBuffer resultBuffer = new StringBuffer();
while ((line = br.readLine()) != null) {
resultBuffer.append(line);
}
br.close();
is.close();
conn.disconnect();
return resultBuffer.toString();
} catch (Exception e) {}
return null;
}
呃还在，我在那这个网页测试：http://v.youku.com/v_show/id_XMTg2NjM4MTI=.html
万分感谢！
Perfect！搞定了，没有串行了！
可是为嘛，我之前的代码似乎也差不多啊？为什么会串行呢？
这个我试了下，结果多出来了600多行重复的，不知为何
anyway，谢谢啦~
直接用htmlparser不就好了.