对于以
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" >
开头的网页源文件下载出现中文乱码
用的函数为
HttpWebRequest all_codeRequest = (HttpWebRequest)WebRequest.Create(web_url);
// Set some reasonable limits on resources used by this request
all_codeRequest.MaximumAutomaticRedirections = 4;
all_codeRequest.MaximumResponseHeadersLength = 4;
// Set credentials to use for this request.
all_codeRequest.Credentials = CredentialCache.DefaultCredentials;
HttpWebResponse all_codeResponse = (HttpWebResponse)all_codeRequest.GetResponse();
// Get the stream associated with the response.
Stream receiveStream = all_codeResponse.GetResponseStream();
// Pipes the stream to a higher level stream reader with the required encoding format.
StreamReader readStream = new StreamReader(receiveStream, Encoding.UTF8); all_code = readStream.ReadToEnd();
readStream.Close();
不知道如何解决 请各位帮忙 谢谢
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" >
开头的网页源文件下载出现中文乱码
用的函数为
HttpWebRequest all_codeRequest = (HttpWebRequest)WebRequest.Create(web_url);
// Set some reasonable limits on resources used by this request
all_codeRequest.MaximumAutomaticRedirections = 4;
all_codeRequest.MaximumResponseHeadersLength = 4;
// Set credentials to use for this request.
all_codeRequest.Credentials = CredentialCache.DefaultCredentials;
HttpWebResponse all_codeResponse = (HttpWebResponse)all_codeRequest.GetResponse();
// Get the stream associated with the response.
Stream receiveStream = all_codeResponse.GetResponseStream();
// Pipes the stream to a higher level stream reader with the required encoding format.
StreamReader readStream = new StreamReader(receiveStream, Encoding.UTF8); all_code = readStream.ReadToEnd();
readStream.Close();
不知道如何解决 请各位帮忙 谢谢
比如:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
<meta http-equiv="Content-Type" content="text/html; charset=gbk" />
对于不同的charset应该按照不同的方式去读取,否则就会出现乱码。
<META http-equiv="Content-Type" content="text/html; charset=gb2312">
只是一个前面多了<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" > 多了<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" >的下载后中文出现乱码
没有那个前缀的显示正常
不知道为什么
request.Timeout = 6000;
HttpWebResponse response = (HttpWebResponse)request.GetResponse(); Stream receiveStream = response.GetResponseStream();
StreamReader readStream = new StreamReader(receiveStream, Encoding.GetEncoding("gb2312"));//使用默认的编码
netHtml = readStream.ReadToEnd(); response.Close();
readStream.Close();
//将字符串写到文件中
StreamWriter sr = File.CreateText(FILE_NAME);
sr.WriteLine(netHtml);
sr.Close();
Encoding.GetEncoding("gb2312"));将这编码方式改成默认的