我读出一个页面的html 现在要遍利整个页面有没有类似下面这段代码
<link href="http://blog.donews.com/laobai/rss.aspx" title="RSS" type="application/rss+xml" rel="alternate" >
取出href的地址注意:我只要读出里面包含type="application/rss+xml"的link下的地址
<link href="http://blog.donews.com/laobai/rss.aspx" title="RSS" type="application/rss+xml" rel="alternate" >
取出href的地址注意:我只要读出里面包含type="application/rss+xml"的link下的地址
string strValue = "<link href='http://blog.donews.com/laobai/rss.aspx' title='RSS' type='application/rss+xml' rel='alternate'>";
Regex reg = new Regex( "<link.*href=[\']?([^\']*)([\']?[^>]*>)",
RegexOptions.Compiled | RegexOptions.IgnoreCase );
foreach( Match m in reg.Matches( strValue ) )
{
if( m.Groups[2].Value.IndexOf( "type='application/rss+xml'" ) >= 0 )
Debug.WriteLine( m.Groups[1].Value );
}
可是为什么我把 strValue 换成整个也面的html代码的字符串就读不出来了呢?
这个的type='application/rss+xml'表达式
string strValue = "<link href=\"http://blog.donews.com/laobai/rss.aspx\" title=\"RSS\" type=\"application/rss+xml\" rel=\"alternate\">";
Regex reg = new Regex( "<link.*href=[\"]?([^\"]*)([\"]?[^>]*>)",
RegexOptions.Compiled | RegexOptions.IgnoreCase );
foreach( Match m in reg.Matches( strValue ) )
{
if( m.Groups[2].Value.IndexOf( "type=\"application/rss+xml\"" ) >= 0 )
Debug.WriteLine( m.Groups[1].Value );
}
的link里面的地址应该怎么做
{
Console.WriteLine(m.Groups[1].Value);
}output:
http://blog.donews.com/laobai/rss.aspx
http://blog.donews.com/laobai/rss.aspx
<link href="http://blog.donews.com/1/rss.aspx" title="RSS" type="application/rss+xml" rel="alternate" ><link href="http://blog.donews.com/2/rss.aspx" title="RSS" type="application/rss+xml" rel="alternate" ><link href="http://blog.donews.com/3/rss.aspx" title="RSS" type="application/rss+xml" rel="alternate" ><link href="http://blog.donews.com/4/rss.aspx" title="RSS" type="application/rss+xml" rel="alternate" ><link href="http://blog.donews.com/5/rss.aspx" title="RSS" type="application/rss+xml" rel="alternate" ><link href="http://blog.donews.com/6/rss.aspx" title="RSS" type="application/rss+xml" rel="alternate" ><link href="http://blog.donews.com/7/rss.aspx" title="RSS" type="application/rss+xml" rel="alternate" ><link href="http://blog.donews.com/8/rss.aspx" title="RSS" type="application/rss+xml" rel="alternate" >
<link href="http://blog.donews.com/9/rss.aspx" title="RSS" type="application/rss+xml" rel="alternate" ><link href="http://blog.donews.com/10/rss.aspx" title="RSS" type="application/rss+xml" rel="alternate" ><link href="http://blog.donews.com/11/rss.aspx" title="RSS" type="application/rss+xml" rel="alternate" ><link href="http://blog.donews.com/21/rss.aspx" title="RSS" type="application/rss+xml" rel="alternate" >
<SCRIPT LANGUAGE="JavaScript">
<!--
function a() {
links = document.getElementsByTagName("link")
for (var i =0 ; i< links.length ; i++ )
{
if (links[i].type.toLowerCase() == "application/rss+xml")
{
document.write(links[i].href + "<br>")
//alert(links[i].href)
}
else
{
document.write("不匹配" + "<br>")
}
}
}
a()
//-->
</SCRIPT>
<link\b(?=(?:(?!\btype=).)*?type="application/rss\+xml"[^>]*>)((?!\bhref=).)*href=(("|')(?<href>.*?)\3|(?<href>[^\s>]*))[^>]*>用 C# 来表示:
string str = @"<link\b(?=(?:(?!\btype=).)*?type=""application/rss\+xml""[^>]*>)((?!\bhref=).)*href=((""|')(?<href>.*?)\3|(?<href>[^\s>]*))[^>]*>";编写工具推荐:
http://www.regexlab.com/mtracer/
测试数据
<link href="http://blog.donews.com/1/rss.aspx" title="RSS" type="application/rss+xml" rel="alternate" >
<link href="http://blog.donews.com/2/rss.aspx" title="RSS" type="application/rss+xml" rel="alternate" >
<link href="http://blog.donews.com/3/rss.aspx" title="RSS" type="2356346" rel="alternate" >
正则模式:Singleline
匹配数:2
∮∮∮∮匹配1∮∮∮∮
组1:http://blog.donews.com/1/rss.aspx
*******************
∮∮∮∮匹配2∮∮∮∮
组1:http://blog.donews.com/2/rss.aspx
*******************