URL url=new URL(et.url);
URLConnection connection=url.openConnection();
connection.setConnectTimeout(2000);
connection.setReadTimeout(2000);
connection.connect();
//String tempStr=new String(connection.getInputStream());
String encodingStr=connection.getContentType();这个有很多时候encodingStr获取的是text/html; 就没后面的charset="xxx"了,自己研究了下,可能是有些网页中间有个空格导致的,但是这样的话,怎么获取网页的编码了??? 我不想先把网页源代码写到string 中再去匹配
URLConnection connection=url.openConnection();
connection.setConnectTimeout(2000);
connection.setReadTimeout(2000);
connection.connect();
//String tempStr=new String(connection.getInputStream());
String encodingStr=connection.getContentType();这个有很多时候encodingStr获取的是text/html; 就没后面的charset="xxx"了,自己研究了下,可能是有些网页中间有个空格导致的,但是这样的话,怎么获取网页的编码了??? 我不想先把网页源代码写到string 中再去匹配
读到<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
然后正则解析一下,再转码就好了。
URLConnection uc = new URL("http://www.sina.com.cn").openConnection();
uc.setConnectTimeout(10000);
uc.setDoOutput(true);
BufferedReader br = new BufferedReader(new InputStreamReader(uc.getInputStream()));
String str;
int i = 0 ;
while( (str = br.readLine()) != null)
{
System.out.println(str);
i++;
if( i > 5)
{
break;
}
}
br.close();读取前5行:内容如下:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!--[30,69,1] published at 2010-06-04 17:30:22 from #150 by 185-->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
<title>新浪首页</title>
这个网页不行http://air.sohu.com 出来的全是乱码