我使用下面的函数得到网页的字符串,并且将得到的字符串写入JDataStore数据库,然后再从数据库中取出放到Jtable中。现在的问题是如果网页编码是GB2312没问题,但是如果是UTF-8编码的网页(Yahoo的搜索页),就会有一个奇怪的问题,我用IE打开网页,可以看到这样一行<meta http-equiv="content-type" content="text/html; charset=UTF-8">;但如果使用下载函数下载,这行就会变成<meta http-equiv="content-type" content="text/html; charset=ISO-8559-1">;而且所有应该显示中文的地方都被空子符代替。
public static String getHtmlText(String strUrl) {
if (strUrl == null || strUrl.length() == 0) {
return null;
} String strHtml = "";
String strLine = "";
try { //链接网络得到网页源代码
URL url = new URL(strUrl);
HttpURLConnection pconn = (HttpURLConnection) url.openConnection();
pconn.addRequestProperty("User-Agent", "IcewolfHttp/1.0");
pconn.addRequestProperty("Accept",
"www/source; text/html; image/gif; */*"); pconn.connect();
//System.out.println("Connect status:"+pconn.getResponseCode());
//if(HttpURLConnection.HTTP_ACCEPTED == pconn.getResponseCode())
//InputStream in = url.openConnection(); InputStream in = pconn.getInputStream();
//System.out.println("Get status:"+pconn.getResponseCode());
BufferedInputStream buff = new BufferedInputStream(in);
Reader r = new InputStreamReader(buff);
BufferedReader br = new BufferedReader(r); while ( (strLine = br.readLine()) != null) {
strHtml += strLine;
}
//strHtml = UnToWC(strHtml);
br.close();
buff.close();
in.close();
pconn.disconnect();
}
catch (MalformedURLException mfe) {
System.err.println("url is not a parsable URL");
}
catch (IOException ioe) {
System.err.println(ioe);
} return strHtml;
}
public static String getHtmlText(String strUrl) {
if (strUrl == null || strUrl.length() == 0) {
return null;
} String strHtml = "";
String strLine = "";
try { //链接网络得到网页源代码
URL url = new URL(strUrl);
HttpURLConnection pconn = (HttpURLConnection) url.openConnection();
pconn.addRequestProperty("User-Agent", "IcewolfHttp/1.0");
pconn.addRequestProperty("Accept",
"www/source; text/html; image/gif; */*"); pconn.connect();
//System.out.println("Connect status:"+pconn.getResponseCode());
//if(HttpURLConnection.HTTP_ACCEPTED == pconn.getResponseCode())
//InputStream in = url.openConnection(); InputStream in = pconn.getInputStream();
//System.out.println("Get status:"+pconn.getResponseCode());
BufferedInputStream buff = new BufferedInputStream(in);
Reader r = new InputStreamReader(buff);
BufferedReader br = new BufferedReader(r); while ( (strLine = br.readLine()) != null) {
strHtml += strLine;
}
//strHtml = UnToWC(strHtml);
br.close();
buff.close();
in.close();
pconn.disconnect();
}
catch (MalformedURLException mfe) {
System.err.println("url is not a parsable URL");
}
catch (IOException ioe) {
System.err.println(ioe);
} return strHtml;
}
解决方案 »
- JAVA继承!!谢谢!!
- 下载网页发现HttpURLConnection返回的InputStream不支持mark,reset的情况下,如何实现这个inputstream的多次使用
- JSP里的<c:forEach>简单问题请教,谢谢
- 如何在JAVA中实现将DBF转换成SQL SERSVER表
- 如何使JList的某cell一开始就有focus
- 请教:对字符数组的赋值和参数的传递! 谢谢!
- 用什么反编译器呢?
- 放分了!!!快乐,郁闷,悲伤,我的昨天这一天!!!!!
- 20分翻译一句话。
- 编程求助啊!!!!
- 可以动态增加的panel,panel中有好几个组件,请问如何取他们的值?
- 关于开源gis的问题,急呀!!
是这样吗,我试过了,没用。