正则匹配问题

<a href='/a/gongyi/2011/0625/8286.html' class='tt'preview/a>

 <a href="/a/gongyi/2011/0625/8286.html" class="title">红十字蓝天救援队社会心理培训班在京举办</a>-- 06-25
</li><li>
 <a href='/a/gongyi/2011/0625/8285.html' class='tt'preview/a>

 <a href="/a/gongyi/2011/0625/8285.html" class="title">担心NBA停摆姚明纳什慈善篮球赛搁浅</a>-- 06-25
</li><li>
 <a href='/a/gongyi/2011/0625/8284.html' class='tt'preview/a>

 <a href="/a/gongyi/2011/0625/8284.html" class="title">倪萍：慈善事业是艺人应该做的事情</a>-- 06-25
</li><li>
 <a href='/a/gongyi/2011/0625/8283.html' class='tt'preview/a>

 <a href="/a/gongyi/2011/0625/8283.html" class="title">斯凯孚助中国青少年参加哥德堡杯足球锦标赛</a>-- 06-25
</li><li>
 <a href='/a/gongyi/2011/0625/8282.html' class='tt'preview/a>

 <a href="/a/gongyi/2011/0625/8282.html" class="title">第六届中华慈善奖评委会召开通过表彰名单</a>-- 06-25
</li><li>
 <a href='/a/gongyi/2011/0624/8194.html' class='tt'preview/a>

 <a href="/a/gongyi/2011/0624/8194.html" class="title">中国绿化基金会——幸福家园西部绿化行动</a>-- 06-24
</li><li>
 <a href='/a/gongyi/2011/0624/8193.html' class='tt'preview/a>

 <a href="/a/gongyi/2011/0624/8193.html" class="title">民间组织启动“清汞行动”</a>-- 06-24
</li><li>
 <a href='/a/gongyi/2011/0624/8192.html' class='tt'preview/a>

 <a href="/a/gongyi/2011/0624/8192.html" class="title">甘肃慈善总会1200万建14个敬老院</a>-- 06-24
</li><li>
 <a href='/a/gongyi/2011/0624/8191.html' class='tt'preview/a>
上面一段在网上采集的代码
string str= @"<a href='\S+' class='tt'preview/a>";
 string[] href;
 Regex Reg = new Regex(str, RegexOptions.IgnoreCase);
 Match m = Reg.Match(text);//从最后一个“/”开始匹配
 if (m.Success)//匹配成功
 {
 href = new string[m.Groups.Count];
 for (int i = 0; i < href.Length; i++)
 {
 href[i] = m.Groups[i].ToString();
 }
 }
 else
 {
 href = new string[0];
 }我要取出里面的连接可是我怎么也匹配不出来还有 我要做采集可是有没有更好的方法把采集出来的列表中的连接取出来还有还有连接取出来了之后有的网站带HTTP 有的就是个/+连接 有的则是单纯的连接我怎么才能把网页真正的网址取出来呢

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

//帖出你需要的结果Regex re = new Regex("(?<=<a\\s*href=\\\").*?(?=\\\"\\s*class=\\\"title\\\">)", RegexOptions.None);
MatchCollection mc = re.Matches("text");
foreach (Match ma in mc)
{
 //ma.Value就是链接
}
(?i)<a[^>]*href=(['"\s])?(?<href>[^'"\s]+)\1[^>]*class=(['"\s])tt\2preview/a>
   RegexOptions.None 这个参数是什么意思？
至于完整的url,你只要随便打开一个链接就可以看到,把公共部分取出来,再添加到List的各个元素里就可以了 string html = "html内容";
 List<string> url = new List<string>();
 MatchCollection matches = Regex.Matches(html, @"<a\s*?href='([^']*?)'[^>]*?>", RegexOptions.IgnoreCase);
 if(matches.Count>=1)
 {
 foreach (Match match in matches)
 {
 url.Add(match.Groups[1].Value);
 }
 }