VB6。两个文本框,一个命令按钮。
Text1.text中的网页源文件: 计算:分数 </DIV> </DIV>">
</form>
<div class="content">
<div class="pic"> <img src="http://bbs.sina.com/and/di.djd?iid=12345566666666666" alt="英语" width="131" height="176" > </div>
<div class="info">
<p class="title">英语 <a href="http://bbs.sa.com/and/drr=1234556retertretr6666666" onclick="asdsfdsffddffdf;"> <img src=images2009/buyebook.gif style='vertical-align:middle;margin-left:15px'> </a> <a href="http://b.a.com/ad"> <img src=http://b.a.com/ad.ja style='vertical-align:middle;margin-left:15px'> </a> </p>
<p class="st">计算:分数 <span>| </span>听力:98 <span>| </span>写作:99 最终想在Text2.text中显示出
英语|计算:分数|听力:98|写作:99
求准确的正则表达式。部分源码参见
http://topic.csdn.net/u/20100307/05/c0dcf6b9-4848-49d1-96c7-e9573b9b77ae.html
Text1.text中的网页源文件: 计算:分数 </DIV> </DIV>">
</form>
<div class="content">
<div class="pic"> <img src="http://bbs.sina.com/and/di.djd?iid=12345566666666666" alt="英语" width="131" height="176" > </div>
<div class="info">
<p class="title">英语 <a href="http://bbs.sa.com/and/drr=1234556retertretr6666666" onclick="asdsfdsffddffdf;"> <img src=images2009/buyebook.gif style='vertical-align:middle;margin-left:15px'> </a> <a href="http://b.a.com/ad"> <img src=http://b.a.com/ad.ja style='vertical-align:middle;margin-left:15px'> </a> </p>
<p class="st">计算:分数 <span>| </span>听力:98 <span>| </span>写作:99 最终想在Text2.text中显示出
英语|计算:分数|听力:98|写作:99
求准确的正则表达式。部分源码参见
http://topic.csdn.net/u/20100307/05/c0dcf6b9-4848-49d1-96c7-e9573b9b77ae.html
[\u4e00-\u9fa5]+(?::\d*)?
[\u4e00-\u9fa5:]+\d*
MsgBox (RegExpTest2(">([^<]+)<", Text1.Text)) '> 和<间的内容
End Sub
Function RegExpTest2(patrn, strng)
Dim regEx, Match As Match, Matches ' Create variable.
Dim RetStr As String, strSubValue
Set regEx = New RegExp ' Create a regular expression.
regEx.Pattern = patrn ' Set pattern.
regEx.IgnoreCase = True ' Set case insensitivity.
regEx.Global = True ' Set global applicability.
Set Matches = regEx.Execute(strng) ' Execute search.
For Each Match In Matches ' Iterate Matches collection.
strSubValue = Trim$(Match.SubMatches(0)) '去首尾空格
strSubValue = Replace$(strSubValue, vbCrLf, "") '去回车换行符
If (Len(strSubValue) > 0) Then '有有效内容
' RetStr = RetStr & "Match found at position "
' RetStr = RetStr & Match.FirstIndex & ". Value is '"
' RetStr = RetStr & strSubValue & "'." & vbCrLf
RetStr = RetStr & strSubValue
End If
Next
RegExpTest2 = RetStr
End Function
"> 英语计算:分数|听力:98|
设想的:
英语|计算:分数|听力:98|写作:99
当网页源文件的数据有不同一级时,只能得到:
">
比如计算:分数</DIV></DIV>">
</form>
<div class="content">
<div class="ttt"><img src="http://ddddd.dddd.com/ddddd/ddddd.dll?ddd=D98313330373ttt63162626ttt6160tttt659" alt="英语" width="131" height="176" ></div>
<div class="ttt">
<p class="title">英语<a href="http://ttttt.tttttttt.com/ttttinfo.tttt?tttttttttttt" onclick="tttVitttnfo(t);"><ttt src=tttttttttt/butttttt.gif style='vertical-align:middle;margin-left:15px'></a><a href="http://tttt.tttttttt.com/ttttttttpingtttt.aspx?tttt=tttttttt"><ttt src=tttttttttt/ttttt_ttttt_-if.gif style='vertical-align:middle;margin-left:15px'></a></p>
<p class="tt">计算:分数<span>|</span>>听力:98 <span>|</span>>写作:99 <
计算:分数</DIV></DIV>">
</form>
<div class="content">
<div class="ttt"><img src="http://ddddd.dddd.com/ddddd/ddddd.dll?ddd=D98313330373ttt63162626ttt6160tttt659" alt="英语" width="131" height="176" ></div>
<div class="ttt">
<p class="title">英语<a href="http://ttttt.tttttttt.com/ttttinfo.tttt?tttttttttttt" onclick="tttVitttnfo(t);"><ttt src=tttttttttt/butttttt.gif style='vertical-align:middle;margin-left:15px'></a><a href="http://tttt.tttttttt.com/ttttttttpingtttt.aspx?tttt=tttttttt"><ttt src=tttttttttt/ttttt_ttttt_-if.gif style='vertical-align:middle;margin-left:15px'></a></p>
<p class="tt">计算:分数<span>|</span>>听力:98 <span>|</span>>写作:99 <
Private Sub Form_Load()
Dim s As String, ss As String
s = "<dslkjfljfjfk><aklsdjljfdjfoiuofiufoiuofiuofiuofiuofuofuoi??>sdk<dkjofjoifuoiufoiufoufoiuofuofiu><jdoiufoiufouffoiufoiu>地方军阀三剑客《<dlkfjjfsjfl>"
Do
ss = CutString(1, s, "<", ">")
s = Replace(s, ss, "")
s = Replace(s, "<>", "", , 1)
Loop While CutString(1, s, "<", ">") <> ""
Debug.Print s
End Sub
'文本截取函数
Public Function CutString(StartNum As Long, InPutString As String, LeftString As String, RightString As String)
On Error Resume Next
Dim StrLine As Long, StrLine2 As Long
StrLine = InStr(StartNum, InPutString, LeftString) + Len(LeftString)
StrLine2 = InStr(StrLine, InPutString, RightString)
CutString = Mid(InPutString, StrLine, StrLine2 - StrLine)
End Function去除所有"<>"中间的内容
s = Replace(s, "<>", "")
就正常了
试试这个:对于本例的执行结果是:
计算:分数">
与预想的还有差距。能否修改一下,以准确命中。谢谢。
Set myRegExp = New RegExp
myRegExp.Pattern = """title"">+(.*?)<[\s\S]*=""tt"">+(.*?)<+(?:.*?)>+(.*?)<+(?:.*?)>+(.*?)<+(?:.*?)>+(.*?)<(?:.*?)>+(.*?)<+"但如果变化真的太大就得另想办法了。
上面不能完全替换 可能只是你取其中一部分不规范造成的,如果完整的一个页面可能就ok了
你继续试试下面的代码,应该可以达到你的要求Private Sub Form_Load()
Dim strData$, strResult$
Dim reg As Object
Dim matchs As Object, match As Object
Set reg = CreateObject("vbscript.regexp")
strData = "网页代码"
reg.Global = True
reg.IgnoreCase = True
reg.Pattern = "[\u4e00-\u9fa5:]+\d*"
Set matchs = reg.Execute(strData)
For Each match In matchs
If InStr(strResult, match & "|") = 0 Then strResult = strResult & match & "|"
Next
If strResult <> "" Then strResult = Left(strResult, Len(strResult) - 1)
MsgBox strResult
End Sub