求一正则表达式,用来提取本地url,并且链接文件必须是html文件
如:
<table bgcolor=#EDF0F5 style="padding-left:9px"><a href="http://news.sina.com.cn/c/2006-04-06/13369549339.shtml" target=_blank>陈水扁污蔑大陆送台湾熊猫是“统战伎俩”</a> <font COLOR=gray>2006/04/06/ 13:36:49<br/>
<a href="data/dg/jxdg.htm" >教学大纲</a>
</table>
要求提取结果为:
教学大纲 data/dg/jxdg.htm
如:
<table bgcolor=#EDF0F5 style="padding-left:9px"><a href="http://news.sina.com.cn/c/2006-04-06/13369549339.shtml" target=_blank>陈水扁污蔑大陆送台湾熊猫是“统战伎俩”</a> <font COLOR=gray>2006/04/06/ 13:36:49<br/>
<a href="data/dg/jxdg.htm" >教学大纲</a>
</table>
要求提取结果为:
教学大纲 data/dg/jxdg.htm
<a.*?href="?(?<url>[\w-/%&=]+?\.(htm|html))"?.+?</a>
取url组。
<a href="data1111/dg/jxdg.htm" >教学大纲11111</a>
</table>
<table bgcolor=#EDF0F5 style="padding-left:9px"><a href="http://news.sina.com.cn/c/2006-04-06/13369549339.shtml" target=_blank>陈水扁污蔑大陆送台湾熊猫是“统战伎俩”</a> <font COLOR=gray>2006/04/06/ 13:36:49<br/>
<a href="data2222/dg/jxdg.htm" >教学大纲22222</a>
</table>
<table bgcolor=#EDF0F5 style="padding-left:9px"><a href="http://news.sina.com.cn/c/2006-04-06/13369549339.shtml" target=_blank>陈水扁污蔑大陆送台湾熊猫是“统战伎俩”</a> <font COLOR=gray>2006/04/06/ 13:36:49<br/>
<a href="data33333/dg/jxdg.htm" >教学大纲3333</a>
</table>表达式:<a.*href="(?<mytag>[^http].*)".*>.*</a>结果:
data1111/dg/jxdg.htm
data2222/dg/jxdg.htm
data33333/dg/jxdg.htm
测试器:http://birdshover.cnblogs.com/archive/2006/05/10/396844.html
教学大纲11111 data1111/dg/jxdg.htm
教学大纲22222 data2222/dg/jxdg.htm
教学大纲33333 data33333/dg/jxdg.htm
MatchCollection ms = reg.Matches(strValue);
foreach(Match m in ms)
{
this.textBox2.AppendText(m.Groups["title"].Value+m.Groups["url"].Value+"\n");
}请看看这个该如何修改?
MatchCollection ms = reg.Matches(strValue);
foreach(Match m in ms)
{
this.textBox2.AppendText(m.Groups["title"].Value+m.Groups["url"].Value+"\r\n");
}
只保留data33333/dg/jxdg.htm
可以修改一下代码:
System.Text.RegularExpressions.Regex reg = new Regex(@"\<a.*href\s*=\s*(?:""(?<url>[^""]*)""|'(?<url>[^']*)'|(?<url>[^\>^\s]+)).*\>(?<title>[^\<^\>]*)\<[^\</a\>]*/a\>", RegexOptions.IgnoreCase);
MatchCollection ms = reg.Matches(strValue);
foreach(Match m in ms)
{
this.textBox2.AppendText(m.Groups["title"].Value+m.Groups["url"].Value+"\n");
}
如何修改可过滤掉非本地链接和非html文件