小弟最近做个程序需要获取网页源码,用idhttp.get(http://alexa.chinaz.com/?domain=baidu.com)返回的是<script language="javascript">alert("您已经被禁止使用我们的查询服务。\n\n有任何疑问请联系 QQ7679512!");window.location.href("/Index.asp");</script>,但是用webbrowser访问的后获取的源码却是正常的源码,请问这是怎么回事呢?
调试欢乐多
然后根据包里的头信息,设置idhttp各项属性,再get
procedure TForm1.Button1Click(Sender: TObject);
var
sHtml:string;
begin
IdHTTP1.Request.UserAgent:='Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)';
IdHTTP1.Request.Accept:='image/gif, image/jpeg, image/pjpeg, image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*';
IdHTTP1.Request.AcceptLanguage:='zh-cn';
IdHTTP1.Request.Connection:='Keep-Alive';
try
sHtml:=IdHTTP1.Get('http://alexa.chinaz.com/?domain=baidu.com');
except
end;
// Memo1.Lines.Add(sHtml);//将网页源码显示在Memo中
end;
以上代码,刚测试了,可以正常得到网页源码
D7+Indy9测试通过
idhttp1.Request.AcceptLanguage:='zh-cn';
idhttp1.Request.AcceptEncoding:='gzip, deflate';
idhttp1.Request.UserAgent:='Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)';
idhttp1.Request.Host:='alexa.chinaz.com';
idhttp1.Request.Connection:='Keep-Alive';
我的indy设置是这样,结果返回的是'?',但是我把idhttp1.Request.AcceptEncoding:='gzip, deflate';去掉后就能正常返回源码,我抓包中含有Accept-Encoding: gzip, deflate这项的,为啥我提交了他就不行呢?
你提交这项就是可以收发压缩包,但你的IDHTTP是没带个功能的,IE才有,所以你去掉以后可以正常获取,获取以后还得解码,用UT8FDECODE转换下乱码就对了!目前我整了个登陆空间的