httpclient下载网页乱码的问题

网页地址为 http://sc.hiapk.com/apps_0_1_1
但是，下载后看到的中文都是乱码，查阅了以前相关问题的帖子，确认
1，不是字符集的问题： InputStreamReader(input,"utf-8")
2，网页不是压缩的
3， IE可以正常打开该网页请大家帮忙看看这究竟是为什么？以下是代码：
import java.net.*;
import java.awt.*;
import java.io.*;
import java.util.zip.GZIPInputStream;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;public class getDataByURL{ public static void main(String[] args)
{
(new getDataByURL()).doWrite();
} private void doStore(String file_name,InputStream input)
{
try{
OutputStreamWriter ow = new OutputStreamWriter( new FileOutputStream(file_name,true));
BufferedReader io = new BufferedReader(new InputStreamReader(input,"utf-8"));
String s;
while((s = io.readLine())!=null){
ow.write(s);
}
ow.flush();
ow.close();
}
catch(IOException e){}
}

public void doWrite()
{
String url_str = "http://sc.hiapk.com/apps_0_1_1";
try{ HttpClient httpclient = new DefaultHttpClient();
HttpGet httpget = new HttpGet(url_str);
httpget.addHeader("Accept-Language", "en-us");
httpget.addHeader("Accept-Encoding", "gzip,deflate");
HttpResponse response = httpclient.execute(httpget);
HttpEntity entity = response.getEntity();
doStore("1.html",entity.getContent());
}
catch(MalformedURLException el){System.out.println("exception");}
catch(IOException e2){System.out.println("exception");}
}
}

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

去掉这个试一试httpget.addHeader("Accept-Encoding", "gzip,deflate");
把httpget.addHeader("Accept-Encoding", "gzip,deflate"); 去掉依然有问题。对比了网络数据和最终保存到文件的数据，发现有所区别：
网页上有个字“浏”
其UTF编码是              E6 B5 8F
网络数据看到也是          E6 B5 8F
但是，保存到文件中的却是  E6 B5 3F ，导致不能正常解码下一就是查查，究竟在哪一步发生了这个问题？查了一些关于字符集的问题，看来是个普遍现象；下面是关于字符集的总结，值得一看
http://tech.163.com/06/0518/09/2HD6OPIV0009159T.html
http://www.tot.name/show/3/7/20051201213200.htm
http://technic.txwm.com/webpage/v35508.html
那就是说你是保存文件后，然后确认的内容不正确？Java对Unicode的支持，我也不知道怎么说好，我上次搞个70000字的UTF8编码的拼音字典，用Java处理
就有些字转码不过来，结果都变成???了。居然还有常见字。我当时没深究。改C++调用Win32 API就没事。我建议你保持为UTF8，UNICODE的，GB2312等编码的文件再试一试。也许是Java的字符集支持的问题。
问题解决了把代码源文件的保存格式（getDataByURL.java）从Cp1252改为UTF-8，然后重新编译程序，就可以获得正确的网页了。感觉，问题的原因在于
1,我的机器是英文系统，所以代码源文件的默认保存格式识别为Cp1252
2,当从下载网页数据，并表明其为UTF-8模式，且存到BufferedReader时，还没有问题。
3,当输出到文件时，我的系统把BufferedReader中的内容当作是Cp1252的了，当保存到文件时，就出了问题。第3步还只是猜测，准备写个例程，具体论证一下。