http://city.qzone.qq.com/html/user/searchpro.htm#mod=0&act=city&nl=3&cd=320100&pg=3上面网址我用了普通的xtml方法没法得到数据,用Inet的post方法也只是得到一对js代码,几乎跟打开看到的页面不相干,用api函数的download和Inet的效果是一样的。用webbroser虽然可以解决问题,但是速度太慢了。各位有没有直接下载页面源代码的方法?为什么xtml用post方法也不行呢?
调试欢乐多
我用的代码是:
'引用Microsoft CDO for Windows 2000 Library和microsoft activeX data objects 2.8 library
Private Sub Command1_Click()
Dim a As New CDO.Message
Dim b As ADODB.Stream
a.CreateMHTMLBody "http://www.****.com", cdoSuppressNone, "", ""
Set b = a.GetStream
b.SaveToFile "c:\1.mht" '保存到C盘,生成1.mht文件
MsgBox "OK"
End Sub
Dim a As New CDO.Message
Dim b As ADODB.Stream
a.CreateMHTMLBody "http://city.qzone.qq.com/html/user/searchpro.htm#mod=0&act=city&nl=3&cd=320102", cdoSuppressNone, "", ""
Set b = a.GetStream
b.SaveToFile "c:\1.mht"
MsgBox "OK"
End Sub这个方法不行啊,最后保存的页面打开后右下部分是空白,好像那个翻页的部分是Ajax的,非普通页面。
.Navigate "http://city.qzone.qq.com/html/user/searchpro.htm#mod=0&act=city&nl=3&cd=320100"
Do Until .ReadyState = 4
DoEvents
Loop
s = .Document.documentElement.outerHTML
.Quit
End With
Dim IEobj As Object
Dim Vdoc As ObjectSet IEobj = CreateObject("InternetExplorer.Application")
IEobj.Navigate "http://city.qzone.qq.com/html/user/searchpro.htm#mod=0&act=city&nl=3&cd=320100"
Set Vdoc = IEobj.document
Text1 = Vdoc.body.innerHTML
End Sub
Sub DownFile(URL As String)
Dim bytes() As Byte
bytes() = Inet1.OpenURL(URL, icByteArray) Dim fName As String
fName = "123.TXT" ' 保存在C盘下
Open "C:\" & fName For Binary Access Write As #1
Put #1, , bytes()
Close #1
End SubPrivate Sub Form_Load()
DownFile "http://city.qzone.qq.com/html/user/searchpro.htm#mod=0&act=city&nl=3&cd=320100&pg=3"End Sub
GET /json.php?mod=sososearch&act=page&type=city&jsontype=str&callback=searchProCb&nl=3&cd=320100&pg=2 HTTP/1.1
Accept: */*
Referer: http://city.qzone.qq.com/html/user/searchpro.htm
Accept-Language: zh-cn
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; AhnLab:APG=2^396822333^22;; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; TheWorld)
Host: city.qzone.qq.com
Connection: Keep-Alive
Cookie: randomSeed=4293841; save_tips=1; adSP=SAxRoVctJSL4dNhzC+OoUSq6r+zwdM9rNU5XjQv4Lw0=_8652_326830_1252676665_; adVer=1777; ac=1,006,006; o_cookie=171977759; pvid=8859942750; flv=10.0; r_cookie=982132854913; pt2gguin=o0171977759; ptcz=ec0054fc17f7b564c57b7bd61897f0b7f9b59d6e1418f3c8693f0cf40d89ebd3; icache=GLACBFAAF; avid=mnW13exIprbp6ReCdRs1SGFh+KkcYsydF9IuWUWX6CSjbK0XV/TtWe+0w3j+8OP4UbiKSJPely0=; randomSeed=8204184; gkey=WIb0wSUxRJx56u44HRUadw7WJE8OLMGeIh53rzTFubtOqPAxtRrBI1irHZQJcIv%2F; SortType=1; comment_skey=a87fdf9e432312a09c5f6280d1c1f813; comment_uin=385762552 sysdzw; ispai2_65593220=2; ispai2_275309361=2; uin_cookie=171977759; euin_cookie=AQAYCpnKn16lEncQtm+XXX6hbTtjnrIKowcaGAAAAADO5balLxaoGPRwe6eIYHmbT+WRUQ==; ssid=s434750928; qzone_city_key=RP%3D1收到的信息: 返回代码:0x00000000
HTTP/1.1 200 OK
Date: Sat, 12 Sep 2009 16:55:26 GMT
Server: Apache
Set-Cookie: zzpaneluin=deleted; expires=Fri, 12-Sep-2008 16:55:25 GMT; path=/; domain=qq.com
Set-Cookie: zzpanelkey=deleted; expires=Fri, 12-Sep-2008 16:55:25 GMT; path=/; domain=qq.com
Set-Cookie: uin=deleted; expires=Fri, 12-Sep-2008 16:55:25 GMT; path=/; domain=qq.com
Set-Cookie: skey=deleted; expires=Fri, 12-Sep-2008 16:55:25 GMT; path=/; domain=qq.com
Cache-Control: max-age=300, max-age=600
Expires: Sat, 12 Sep 2009 17:05:26 GMT
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 2271
Connection: close
Content-Type: application/x-javascript; charset=utf-8然后我继续用Inet的Execute方法,设置url、发送数据、引用页等,然后post,结果却还是不包含主要数据(qq号码、昵称、头像URL等信息),解码后如下:
searchProCb(showSrchResult({"responseHeader": {"status":"0","QTime":"8"},"response": {"numFound":"10000000","currentNum":"10000","results":[],"CorrectPin": [],"SimilarWords": []}}););跟直接下载这个网址的效果是一样的,http://city.qzone.qq.com/json.php?mod=sososearch&act=page&type=city&jsontype=str&callback=searchProCb&nl=3&cd=320100&pg=2,也就是说post过去服务器并没有认可。
实质是因为,右下部分的一大片数据是通过一系列js脚本执行的结果,这部分脚本执行时(点击下一页下一页的时候)跟服务器通讯时传递的也就是那么几个参数,然后得到json这个文件的内容,但是用Inet模拟出一样的环境却得不到数据,我在11楼已经详细描述了下。