今天用Java做了一个提取网页内容的小实验,但不知道为什么提取的内容中文为乱码,
请高手们多多赐教,不甚感激~~~
源代码如下:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;public class WebContent{
public String getOneHtml(String htmlurl) throws IOException{
URL url;
String temp;
StringBuffer sb = new StringBuffer();
try{
url = new URL(htmlurl);
BufferedReader in = new BufferedReader(new InputStreamReader(url
.openStream(), "gb2312"));
while ((temp = in.readLine()) != null){
sb.append(temp);
}
in.close();
}catch(MalformedURLException me){
System.out.println("your url is wrong,please input");
me.getMessage();
throw me;
}catch (IOException e){
e.printStackTrace();
throw e;
}
return sb.toString();
}
public static void main(String []args){
WebContent web = new WebContent();
String webcontent = web.getOneHtml("http://www.baidu.com/");
System.out.println(webcontent);
}
请高手们多多赐教,不甚感激~~~
源代码如下:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;public class WebContent{
public String getOneHtml(String htmlurl) throws IOException{
URL url;
String temp;
StringBuffer sb = new StringBuffer();
try{
url = new URL(htmlurl);
BufferedReader in = new BufferedReader(new InputStreamReader(url
.openStream(), "gb2312"));
while ((temp = in.readLine()) != null){
sb.append(temp);
}
in.close();
}catch(MalformedURLException me){
System.out.println("your url is wrong,please input");
me.getMessage();
throw me;
}catch (IOException e){
e.printStackTrace();
throw e;
}
return sb.toString();
}
public static void main(String []args){
WebContent web = new WebContent();
String webcontent = web.getOneHtml("http://www.baidu.com/");
System.out.println(webcontent);
}
5楼是看代码里面的乱码
这个编写代码的程序编码的问题另外还有问题就是中文显示在jsp页面上也会有乱码的情况
需要在发送jsp是转下码,显示的时候再转一次