我使用java的HttpURLConnection查询google词典的内容.
例如这个链接:http://translate.google.com/translate_a/t?client=t&text=%E8%AE%A1%E7%AE%97%E6%9C%BA&sl=zh&tl=en
用ie打开看到的内容是:
[[["Computer","计算机",""]],[["名词",["Computer"]]],"zh-CN"]如果用HttpURLConnection下载这个链接,则得到的内容不仅是乱码,而且和ie显示的还不一样,如下所示:
[[["Cong $ Computer","计算�",""]],,"zh-CN"]下面是我的下载函数的代码,高手帮忙看看是哪里出了问题.public static String getHtmlText(String strUrl, int timeout,) {
if (strUrl == null || strUrl.length() == 0) {
return null;
} StringBuffer strHtml = null;
String strLine = "";
HttpURLConnection httpConnection = null;// 这里可以定义成HttpURLConnection
InputStream urlStream = null;
BufferedInputStream buff = null;
BufferedReader br = null;
Reader r = null;
boolean isError = false;
try {
// 链接网络得到网页源代码
URL url = new URL(strUrl); httpConnection = (HttpURLConnection) url.openConnection();
httpConnection.addRequestProperty("User-Agent", "IcewolfHttp/1.0");
httpConnection.addRequestProperty("Accept",
"www/source; text/html; image/gif; */*");
httpConnection.addRequestProperty("Accept-Language", "zh-CN");
httpConnection.addRequestProperty("Accept-Encoding",
"gzip, deflate");
httpConnection.addRequestProperty("Accept-Charset", "UTF-8"); httpConnection.setConnectTimeout(timeout);
httpConnection.setReadTimeout(timeout);
urlStream = httpConnection.getInputStream();
buff = new BufferedInputStream(urlStream);
r = new InputStreamReader(buff, "UTF-8");
// r = new InputStreamReader(buff, strEnCoding);
br = new BufferedReader(r);
strHtml = new StringBuffer("");
while ((strLine = br.readLine()) != null) {
strHtml.append(strLine + "\r\n");
}
} catch (java.lang.OutOfMemoryError out) {
// System.out.println("内存占用:" + strHtml.capacity());
// out.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
System.out.println(e.getClass() + "下载网页" + strUrl + "失败");
isError = true;
} finally {
try {
if (httpConnection != null)
httpConnection.disconnect();
if (br != null)
br.close();
if (r != null)
r.close();
if (buff != null)
buff.close();
if (urlStream != null)
urlStream.close();
} catch (Exception e) {
// System.out.println(e.getClass() + "下载网页" + strUrl +
// "连接关闭失败");
return null;
}
} if (strHtml == null || isError)
return null;
return strHtml.toString();
}