刚学Java不久,最近有一个任务,需要经常在程序中获取源文件,可是我发现获的源代码经常是乱码,这是怎么回事?
以下是我从Google上获取源文件的代码import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test2 {
public static void main(String[] args) throws IOException {
HttpURLConnection httpurlconnection = null;
String googlesite="http://www.google.cn/search?hl=zh-CN&newwindow=1&client=aff-maxthon&hs=7tK&channel=channel4&source=hp&q=%E4%BC%98%E9%85%B7&btnG=Google+%E6%90%9C%E7%B4%A2&aq=f&oq=";
String line=null;
        URL url=new URL(googlesite);
        httpurlconnection = (HttpURLConnection) url.openConnection();
httpurlconnection.setConnectTimeout(60000);
httpurlconnection.setReadTimeout(60000);
httpurlconnection.setDoOutput(false);
httpurlconnection.setDoInput(true);
httpurlconnection.setRequestMethod("GET");
httpurlconnection.setRequestProperty("x-requested-with",
"XMLHttpRequest");
httpurlconnection.setRequestProperty("Accept-Language", "zh-cn");
httpurlconnection.setRequestProperty("Referer",
"http://www.google.cn/search?hl=zh-CN&newwindow=1&client=aff-maxthon&hs=7tK&channel=channel4&source=hp&q=%E4%BC%98%E9%85%B7&btnG=Google+%E6%90%9C%E7%B4%A2&aq=f&oq=");
httpurlconnection.setRequestProperty("Accept",
"image/jpeg, application/x-ms-application, image/gif, application/xaml+xml, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*");
httpurlconnection.setRequestProperty("Content-Type",
"application/x-www-form-urlencoded");
httpurlconnection.setRequestProperty("Accept-Encoding",
"gzip, deflate");
httpurlconnection
.setRequestProperty("User-Agent",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Mathon 2.0)");
httpurlconnection.setRequestProperty("Host", "www.google.cn");
httpurlconnection.setRequestProperty("Connection", "Keep-Alive");
httpurlconnection.connect();
        BufferedReader brr =new BufferedReader(new InputStreamReader(httpurlconnection.getInputStream(),"utf-8"));
        
        while((line=brr.readLine())!=null)
        {
         System.out.println(line);
        }
        httpurlconnection.disconnect();

        }
    
}

解决方案 »

  1.   


    因为你设置的返回格式为gzip,所以是乱码。
    1. 注释
    // httpurlconnection.setRequestProperty("Accept-Encoding", "gzip, deflate");2. 或者你解码
      

  2.   

    httpurlconnection.setRequestProperty("Accept-Encoding","gzip, deflate");乱码一般是两种情况。
    一种是字符编码不同。
    另一种是,传输的数据,可能采用压缩格式了。楼主可以通过应答过程中,HTTP头部的属性判断,是编码问题,还是压缩问题。
    有可能是压缩问题,因为,请求过程中,HTTP的头部,说明你的程序支持对数据的gzip压缩算法。
      

  3.   

    2. gzip
    BufferedReader brr =new BufferedReader(new InputStreamReader(new GZIPInputStream(httpurlconnection.getInputStream())));
      

  4.   

    谢谢大家了,那个gzip其实我不懂什么意思,是从师兄写的这段设置setRequestProperty搬下来的,可是为什么当我不设置这些setqRequetProperty时会出现Http 403错误?
      

  5.   


    gzip一种数据压缩方式,其实其他的setqRequetProperty我想都可以不用,但是设置多了是模拟浏览器更像,其实你完全可以就设置两个
          httpurlconnection.setRequestProperty("User-Agent","Mozilla/5.0");//模拟的浏览器是火狐
           httpurlconnection.setRequestProperty("Accept-Encoding","gzip,deflate");//接收数据为gzip压缩格式