想提取一个网页中的符合要求的链接。

网页中有
<a href="a.htm"><img src=a.jpg></a>其他内容<a href="a.htm">这是a</a>其他内容
<a href="b.htm"><img src=b.jpg></a>其他内容<a href="b.htm">这是b</a>其他内容
<a href="c.htm"><img src=c.jpg></a>其他内容<a href="c.htm">这是c</a>其他内容想提供网页的链接"a.htm","b.htm","c.htm",不知道如何写正则表达式呢。
因为"a.htm","b.htm","c.htm"总是有两个，是会重复的，谢谢了。

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

p_Pattern = @"^/w+.htm$";
MatchCollection urls = Regex.Matches(p_Input, p_Pattern, RegexOptions.IgnoreCase);
Hashtable arr = new Hashtable();

foreach(Match url in urls){
  try{
arr.Add(url.ToString());
  }catch{}
}
纠正一下
foreach(Match url in urls){
  try{
arr.Add(url.ToString()，url.ToString(）);
  }catch{}
}
小飞羊不要只匹配*.htm啊。
我想匹配<a href="*">，得到*的内容，谢谢。
那你第一次匹配所有的<a href="*">，然后循环再匹配*部分就可以了。
第一次匹配模式为 <a href="\w+.htm">
//try this
public class Test
{
public static void Main()
{
string html = "your html here";
StringCollection urls = GetUrlsInHtml(html); foreach(string url in urls   )
{
Console.WriteLine(url);
}
}
public static StringCollection GetUrlsInHtml(string html)
{

string pattern = @"href=""(?<url>\w+\.htm)""";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(html); StringCollection urls = new StringCollection(); foreach (Match match in matches)
{
string url = match.Groups["url"].Value;

if (!urls.Contains(url))
{
urls.Add(url);
}
} return urls;
}
}