获取网站源文时的乱码问题

   刚学Java不久，最近有一个任务，需要经常在程序中获取源文件，可是我发现获的源代码经常是乱码，这是怎么回事？
以下是我从Google上获取源文件的代码import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test2 {
public static void main(String[] args) throws IOException {
HttpURLConnection httpurlconnection = null;
String googlesite="http://www.google.cn/search?hl=zh-CN&newwindow=1&client=aff-maxthon&hs=7tK&channel=channel4&source=hp&q=%E4%BC%98%E9%85%B7&btnG=Google+%E6%90%9C%E7%B4%A2&aq=f&oq=";
String line=null;
        URL url=new URL(googlesite);
        httpurlconnection = (HttpURLConnection) url.openConnection();
httpurlconnection.setConnectTimeout(60000);
httpurlconnection.setReadTimeout(60000);
httpurlconnection.setDoOutput(false);
httpurlconnection.setDoInput(true);
httpurlconnection.setRequestMethod("GET");
httpurlconnection.setRequestProperty("x-requested-with",
"XMLHttpRequest");
httpurlconnection.setRequestProperty("Accept-Language", "zh-cn");
httpurlconnection.setRequestProperty("Referer",
"http://www.google.cn/search?hl=zh-CN&newwindow=1&client=aff-maxthon&hs=7tK&channel=channel4&source=hp&q=%E4%BC%98%E9%85%B7&btnG=Google+%E6%90%9C%E7%B4%A2&aq=f&oq=");
httpurlconnection.setRequestProperty("Accept",
"image/jpeg, application/x-ms-application, image/gif, application/xaml+xml, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*");
httpurlconnection.setRequestProperty("Content-Type",
"application/x-www-form-urlencoded");
httpurlconnection.setRequestProperty("Accept-Encoding",
"gzip, deflate");
httpurlconnection
.setRequestProperty("User-Agent",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Mathon 2.0)");
httpurlconnection.setRequestProperty("Host", "www.google.cn");
httpurlconnection.setRequestProperty("Connection", "Keep-Alive");
httpurlconnection.connect();
        BufferedReader brr =new BufferedReader(new InputStreamReader(httpurlconnection.getInputStream(),"utf-8"));

        while((line=brr.readLine())!=null)
        {
         System.out.println(line);
        }
        httpurlconnection.disconnect();

        }

}

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

因为你设置的返回格式为gzip，所以是乱码。
1. 注释
// httpurlconnection.setRequestProperty("Accept-Encoding", "gzip, deflate");2. 或者你解码
httpurlconnection.setRequestProperty("Accept-Encoding","gzip, deflate");乱码一般是两种情况。
一种是字符编码不同。
另一种是，传输的数据，可能采用压缩格式了。楼主可以通过应答过程中，HTTP头部的属性判断，是编码问题，还是压缩问题。
有可能是压缩问题，因为，请求过程中，HTTP的头部，说明你的程序支持对数据的gzip压缩算法。
2. gzip
BufferedReader brr =new BufferedReader(new InputStreamReader(new GZIPInputStream(httpurlconnection.getInputStream())));
谢谢大家了，那个gzip其实我不懂什么意思，是从师兄写的这段设置setRequestProperty搬下来的，可是为什么当我不设置这些setqRequetProperty时会出现Http 403错误？
gzip一种数据压缩方式，其实其他的setqRequetProperty我想都可以不用，但是设置多了是模拟浏览器更像，其实你完全可以就设置两个
      httpurlconnection.setRequestProperty("User-Agent","Mozilla/5.0");//模拟的浏览器是火狐
       httpurlconnection.setRequestProperty("Accept-Encoding","gzip,deflate");//接收数据为gzip压缩格式