正则求助

(?is)<div class="maincon">.*?</div>

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

3楼完全理解我的意思，输出也合符我的要求，但是我需要用vb来写，因为用的是vbs脚本
目的就是采集58网的信息内容，我贴上我的代码，有兴趣的可以看看
On Error Resume Next
'参数设置
Const j = 100 '循环多少次后停止'开始
Dim IE
Set IE = CreateObject("InternetExplorer.Application")
set h=createobject("Microsoft.XMLHTTP")'for k = 1 to j
Const url="http://mdj.58.com/zufang/0/" '目标板块地址
h.open "get",url,false
h.send
yuan = h.ResponseText
list= url_list(yuan)
'测试内容
Set FSO = CreateObject("scripting.filesystemobject")
set strfile = FSO.CreateTextFile("temp.txt",true,true)
strfile.write list
Set FSO = Nothing
myarray=split(list,",")
num=ubound(myarray)
'for i = 2 to num -1 '跳过置顶
call getHtmlStr(myarray(2)) '循环调用发帖过程
createobject("wscript.shell").run "cmd /c taskkill /im iexplore.exe /f",0,true
'next
'WScript.Sleep 1800000 '发一轮完了，暂停30分钟
'next
'createobject("wscript.shell").run "cmd /c taskkill /im iexplore.exe /f",0,true
function url_list(str) '批量提取信息地址
dim list
Set regEx = New RegExp
regEx.Global = True
regEx.Pattern="http:\/\/mdj.58.com\/zufang\/[^>]+.shtml"
Set Matches = regEx.Execute(str)
For Each Match in Matches
list=list&Match.Value&","
Next
url_list=list
end functionfunction info(str,reg) '要提取的内容
Set regEx = New RegExp
regEx.Global = True
regEx.Pattern=reg
Set Matches = regEx.Execute(str)
For Each Match in Matches
i=i&Match.Value
Next
info=i
end functionFunction getHtmlStr(strUrl) '获取源码
h.open "get","http://mdj.58.com/zufang/7629892349318x.shtml",false
h.send
a=h.ResponseText
'第一次提取
bt=info(a,"<h1>[^>]+<\/h1>")
zj=info(a,"<li><i>租　金：<\/i><span class=""pri"">[^>]+<\/li>")
hx=info(a,"<i>户　型：<\/i>[^>]+<\/li>")
lc=info(a,"<li><i>类　型：<\/i>[^>]+<\/li>")
pz=info(a,"var tmp = '[^>]+'; document.write")
qy=info(a,"<li><i>区　域：<\/i>[^>]+<\/li>")
xq=info(a,"<li><i>小　区：<\/i>[^>]+<\/li>")
xm=info(a,"username:'[^>]+',")
dh=info(a,"<img src='http:\/\/image.58.com\/showphone.aspx\?t=v55\&v=[^>]+' \/>")
nr=info(a,"<div class=""maincon"">[^>]+<\/div>")
'测试内容
Set FSO = CreateObject("scripting.filesystemobject")
set strfile = FSO.CreateTextFile("info.txt",true,true)
strfile.write nr
Set FSO = Nothing'把上面取得的信息整理后提交给接口文件直接入库
End Function
上面的代码还没写完，我的思路是：先提取列表页所有信息的地址，这一步已经完成了，然后再循环提取每一个地址的内容，把内容提取出来整理好，然后再提交给接口文件，接口文件接受数据直接入库目前就困在提取每一个地址的内容上了，我上面的方法要用2次，第一次提取到的有html代码，还得进行第二次处理。不知道有没有办法一次性就能将所有需要的内容提取出来，望指点一下。我这个是vbs脚本。