求正则得到内容正在做采集程序

<a href='http://info.china.alibaba.com/news/detail/v6-d1006486548.html' onmousedown="return aliclick(this,'?tracelog=yl_vip_plas');" target="_blank">中废网8月17日各地废钢价格行情</a> [2009-08-17 11:22]</td></tr> 以上这段是html代码，要采集这段代码的相关内容，想得到 <a>的每个链接地址和显示内容和 2009-08-17 11:22 的日期内容请问如何得到呢很急！谢谢大家帮助了！

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

Regex ex = new Regex(@"<a href='(.*?)'.*?>(.*?)</a>.*\[(.*?)\]");
 MatchCollection ms = ex.Matches("<a href='http://info.china.alibaba.com/news/detail/v6-d1006486548.html' onmousedown=\"return aliclick(this,'?tracelog=yl_vip_plas');\" target=\"_blank\">中废网8月17日各地废钢价格行情 </a> [2009-08-17 11:22] </td> </tr>");
 foreach (Match m in ms)
 {
 Console.WriteLine(m.Groups[1].Value);
 Console.WriteLine(m.Groups[2].Value);
 Console.WriteLine(m.Groups[3].Value);
 }
//我已经把那个页面的源码（就是HTML代码存在test.txt里面了）
 StreamReader sr = new StreamReader("e:\\test.txt",Encoding.UTF8);
 Regex reg1 = new Regex(@"<a.*?href='(?<src>[^""]*)'[^>]*>(?<content>.*?)</a>[\s]*<span.*?>(?<time>.*?)", RegexOptions.IgnoreCase);
 MatchCollection mc1= reg1.Matches(sr.ReadToEnd()); //设定要查找的字符串
 foreach (Match m1 in mc1)
 {
 this.textBox1.Text +="链接："+m1.Groups["src"].Value +";内容："+ m1.Groups["content"].Value +";时间："+ m1.Groups["time"].Value + Environment.NewLine; ;
 }
 sr.Close();

求 正则 得到内容 正在做采集程序

解决方案 »

求正则得到内容正在做采集程序