这是我写的正则
Set re = CreateObject("vbscript.regexp")
re.Pattern = "#shared\u0022\u003E[A-Za-z0-9]+\.[A-Za-z0-9.]+" 'https?://[A-Za-z0-9]+\.[A-Za-z0-9.]+"
re.Global = True
re.IgnoreCase = True
Set ms = re.Execute(omg)
If ms.Count = 0 Then Exit Sub
For Each sm In ms
Text3 = Text3 & vbCrLf & sm.Value这是网页源码:
<div id="c2"><div class="xbdiv"><h3><span id="sharedha">Host names sharing IP with A records (3 items)</span></h3><ul class="xbul"><li><a href="//host.robtex.com/bbs.csdn.net.html#shared">bbs.csdn.net</a></li>
<li><a href="//host.robtex.com/community.csdn.net.html#shared">community.csdn.net</a></li>
<li><a href="//host.robtex.com/topic.csdn.net.html#shared">topic.csdn.net</a></li>我想抓取csdn.net 这个子域名并且最终显示为community.csdn.net topic.csdn.net大神求教导
Set re = CreateObject("vbscript.regexp")
re.Pattern = "#shared\u0022\u003E[A-Za-z0-9]+\.[A-Za-z0-9.]+" 'https?://[A-Za-z0-9]+\.[A-Za-z0-9.]+"
re.Global = True
re.IgnoreCase = True
Set ms = re.Execute(omg)
If ms.Count = 0 Then Exit Sub
For Each sm In ms
Text3 = Text3 & vbCrLf & sm.Value这是网页源码:
<div id="c2"><div class="xbdiv"><h3><span id="sharedha">Host names sharing IP with A records (3 items)</span></h3><ul class="xbul"><li><a href="//host.robtex.com/bbs.csdn.net.html#shared">bbs.csdn.net</a></li>
<li><a href="//host.robtex.com/community.csdn.net.html#shared">community.csdn.net</a></li>
<li><a href="//host.robtex.com/topic.csdn.net.html#shared">topic.csdn.net</a></li>我想抓取csdn.net 这个子域名并且最终显示为community.csdn.net topic.csdn.net大神求教导
>>> import re
>>> s = '''<div id="c2"><div class="xbdiv"><h3><span id="sharedha">Host names sharing IP with A records (3 items)</span></h3><ul class="xbul"><li><a href="//host.robtex.com/bbs.csdn.net.html#shared">bbs.csdn.net</a></li>
<li><a href="//host.robtex.com/community.csdn.net.html#shared">community.csdn.net</a></li>
<li><a href="//host.robtex.com/topic.csdn.net.html#shared">topic.csdn.net</a></li>'''
>>> res = r'>(\w+.csdn.net)<'
>>> print re.findall(res,s)
['bbs.csdn.net', 'community.csdn.net', 'topic.csdn.net']
>>> res = r'>([^b]\w+.csdn.net)<'
>>> print re.findall(res,s)
['community.csdn.net', 'topic.csdn.net']
>>>