如何对这样的网页源文件进行正则处理。

VB6。两个文本框，一个命令按钮。
Text1.text中的网页源文件：计算：分数 </DIV> </DIV>">
</form>
<div class="content">
<div class="pic"> <img src="http://bbs.sina.com/and/di.djd?iid=12345566666666666" alt="英语" width="131" height="176" > </div>
<div class="info">
<p class="title">英语 <a href="http://bbs.sa.com/and/drr=1234556retertretr6666666" onclick="asdsfdsffddffdf;"> <img src=images2009/buyebook.gif style='vertical-align:middle;margin-left:15px'> </a> <a href="http://b.a.com/ad"> <img src=http://b.a.com/ad.ja style='vertical-align:middle;margin-left:15px'> </a> </p>
<p class="st">计算：分数 <span>| </span>听力：98 <span>| </span>写作：99 最终想在Text2.text中显示出
英语｜计算：分数｜听力：98｜写作：99
求准确的正则表达式。部分源码参见
http://topic.csdn.net/u/20100307/05/c0dcf6b9-4848-49d1-96c7-e9573b9b77ae.html

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

本例就是要获取“>　 <”之间的内容。VB6有没有去掉htlm标识比较简单的方法？即把 <a href="http://bbs.sa.com/and/drr=1234556retertretr6666666"  onclick="asdsfdsffddffdf;"> 、<img src="http://bbs.sina.com/and/di.djd?iid=12345566666666666" alt="英语"  width="131" height="176" > </div>、<div class="content"> 之类的htlm内容去掉。
那个匹配不到是因为楼上同学的正则有问题，换成下面的就没ok了
[\u4e00-\u9fa5]+(?:：\d*)?
或者这样写：
[\u4e00-\u9fa5：]+\d*
其实就是加个()把要提取的SubMatches标识出来Private Sub Form_Load()
    MsgBox (RegExpTest2(">([^<]+)<", Text1.Text))   '> 和<间的内容
End Sub
Function RegExpTest2(patrn, strng)
  Dim regEx, Match As Match, Matches      ' Create variable.
  Dim RetStr As String, strSubValue
  Set regEx = New RegExp         ' Create a regular expression.
  regEx.Pattern = patrn         ' Set pattern.
  regEx.IgnoreCase = True         ' Set case insensitivity.
  regEx.Global = True         ' Set global applicability.
  Set Matches = regEx.Execute(strng)   ' Execute search.
  For Each Match In Matches      ' Iterate Matches collection.
    strSubValue = Trim$(Match.SubMatches(0))    '去首尾空格
    strSubValue = Replace$(strSubValue, vbCrLf, "")   '去回车换行符
    If (Len(strSubValue) > 0) Then  '有有效内容
'        RetStr = RetStr & "Match found at position "
'        RetStr = RetStr & Match.FirstIndex & ". Value is '"
'        RetStr = RetStr & strSubValue & "'." & vbCrLf
        RetStr = RetStr & strSubValue
    End If
  Next
  RegExpTest2 = RetStr
End Function
谢谢sysdzw、bcrun。bcrun的程序好像还不够准确，得到的是：
"> 英语计算：分数|听力：98|
设想的：
英语｜计算：分数｜听力：98｜写作：99
当网页源文件的数据有不同一级时，只能得到：
">
比如计算：分数</DIV></DIV>">
</form>
<div class="content">
<div class="ttt"><img src="http://ddddd.dddd.com/ddddd/ddddd.dll?ddd=D98313330373ttt63162626ttt6160tttt659" alt="英语"  width="131" height="176" ></div>
<div class="ttt">
<p class="title">英语<a href="http://ttttt.tttttttt.com/ttttinfo.tttt?tttttttttttt"  onclick="tttVitttnfo(t);"><ttt src=tttttttttt/butttttt.gif  style='vertical-align:middle;margin-left:15px'></a><a href="http://tttt.tttttttt.com/ttttttttpingtttt.aspx?tttt=tttttttt"><ttt src=tttttttttt/ttttt_ttttt_-if.gif  style='vertical-align:middle;margin-left:15px'></a></p>
<p class="tt">计算：分数<span>|</span>>听力：98 <span>|</span>>写作：99 <
上面的显示不正确，用下面的。
计算：分数</DIV></DIV>">
</form>
<div class="content">
<div class="ttt"><img src="http://ddddd.dddd.com/ddddd/ddddd.dll?ddd=D98313330373ttt63162626ttt6160tttt659" alt="英语"  width="131" height="176" ></div>
<div class="ttt">
<p class="title">英语<a href="http://ttttt.tttttttt.com/ttttinfo.tttt?tttttttttttt"  onclick="tttVitttnfo(t);"><ttt src=tttttttttt/butttttt.gif  style='vertical-align:middle;margin-left:15px'></a><a href="http://tttt.tttttttt.com/ttttttttpingtttt.aspx?tttt=tttttttt"><ttt src=tttttttttt/ttttt_ttttt_-if.gif  style='vertical-align:middle;margin-left:15px'></a></p>
<p class="tt">计算：分数<span>|</span>>听力：98 <span>|</span>>写作：99 <
试试这个：
Private Sub Form_Load()
Dim s As String, ss As String
s = "<dslkjfljfjfk><aklsdjljfdjfoiuofiufoiuofiuofiuofiuofuofuoi??>sdk<dkjofjoifuoiufoiufoufoiuofuofiu><jdoiufoiufouffoiufoiu>地方军阀三剑客《<dlkfjjfsjfl>"
Do
    ss = CutString(1, s, "<", ">")
    s = Replace(s, ss, "")
    s = Replace(s, "<>", "", , 1)
Loop While CutString(1, s, "<", ">") <> ""
Debug.Print s
End Sub
'文本截取函数
Public Function CutString(StartNum As Long, InPutString As String, LeftString As String, RightString As String)
    On Error Resume Next
    Dim StrLine As Long, StrLine2 As Long
    StrLine = InStr(StartNum, InPutString, LeftString) + Len(LeftString)
    StrLine2 = InStr(StrLine, InPutString, RightString)
    CutString = Mid(InPutString, StrLine, StrLine2 - StrLine)
End Function去除所有"<>"中间的内容
   s = Replace(s, "<>", "", , 1)这个改为
   s = Replace(s, "<>", "")
就正常了
[Quote=引用 7 楼 yiguangqiang88 的回复:]
试试这个：对于本例的执行结果是：
计算：分数">
与预想的还有差距。能否修改一下，以准确命中。谢谢。
这个只是删除“<>”和其中的内容的，除非你取的源文件不完全，否则是没问题的。至于多出的“> <”之间的部分，你再自己想想法子吧。
Dim myRegExp
Set myRegExp = New RegExp
myRegExp.Pattern = """title"">+(.*?)<[\s\S]*=""tt"">+(.*?)<+(?:.*?)>+(.*?)<+(?:.*?)>+(.*?)<+(?:.*?)>+(.*?)<(?:.*?)>+(.*?)<+"但如果变化真的太大就得另想办法了。
按照你原来的思路不必替换那些标签其实也可以的啊，你越走越远了
上面不能完全替换可能只是你取其中一部分不规范造成的，如果完整的一个页面可能就ok了
你继续试试下面的代码，应该可以达到你的要求Private Sub Form_Load()
    Dim strData$, strResult$
    Dim reg As Object
    Dim matchs As Object, match As Object
    Set reg = CreateObject("vbscript.regexp")

    strData = "网页代码"
    reg.Global = True
    reg.IgnoreCase = True
    reg.Pattern = "[\u4e00-\u9fa5：]+\d*"

    Set matchs = reg.Execute(strData)

    For Each match In matchs
        If InStr(strResult, match & "|") = 0 Then strResult = strResult & match & "|"
    Next
    If strResult <> "" Then strResult = Left(strResult, Len(strResult) - 1)

    MsgBox strResult
End Sub