关于URLconnection获取网页编码类型

URL url=new URL(et.url);
URLConnection connection=url.openConnection();
connection.setConnectTimeout(2000);
connection.setReadTimeout(2000);
connection.connect();

//String tempStr=new String(connection.getInputStream());

String encodingStr=connection.getContentType();这个有很多时候encodingStr获取的是text/html; 就没后面的charset="xxx"了，自己研究了下，可能是有些网页中间有个空格导致的，但是这样的话，怎么获取网页的编码了？？？我不想先把网页源代码写到string 中再去匹配

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

要用这个，必须先知道 HttpURLConnection#getContentType 这个是从哪来的？getContentType 是从 HTTP 响应头中 Content-Type 的数据，如果服务端没有设置过这个响应头，那么就得不到这个信息。
可能是空格造成的，有的网页是取不到，例如新浪的就取不到他的编码，但实质上他是有的。取不到的可以先读取页面的前几行，
读到<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
然后正则解析一下，再转码就好了。
URLConnection uc = new URL("http://www.sina.com.cn").openConnection();
uc.setConnectTimeout(10000);
uc.setDoOutput(true);
BufferedReader br = new BufferedReader(new InputStreamReader(uc.getInputStream()));
String str;
int i = 0 ;
while( (str = br.readLine()) != null)
{
System.out.println(str);
i++;
if( i > 5)
{
break;
}
}
br.close();读取前5行：内容如下：
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
<title>新浪首页</title>
你再用正则判定一下，把那个meta过滤出来。就可以了
这个网页不行http://air.sohu.com  出来的全是乱码