问个盗取别人网站HTML代码进行处理的问题

上面是个某网站的html源代码我想做的就是抓<tr align="center" bgcolor="#EFEFEF">
      <td width="8%" nowrap bgcolor="#CFDBE8" title="搜寻引擎收录情况">收录情况</td>      <td width="7%"><a href='http://www.baidu.com/s?wd=site%3Awww.baidu.com&cl=3' target=_blank title='2140' rel=nofollow class=LN>2140</a></td>      <td width="7%"><a href='http://www.google.cn/search?hl=zh-CN&q=site%3Awww.baidu.com' target=_blank title='36700' rel=nofollow class=LN>36700</a></td>      <td width="8%"><a href='http://sitemap.cn.yahoo.com/search?bwm=p&p=www.baidu.com' target=_blank title='927' rel=nofollow class=LN>927</a></td>      <td width="7%"><a href='http://www.sogou.com/web?query=site%3Awww.baidu.com' target=_blank title='870048' rel=nofollow class=LN>870048</a></td>      <td width="7%"><a href='http://www.soso.com/q?w=site%3Awww.baidu.com&sc=web&ch=w.ptl&lr=chs' target=_blank title='25900' rel=nofollow class=LN>25900</a></td>    </tr>
    <tr align="center" bgcolor="#EFEFEF">
      <td nowrap bgcolor="#CFDBE8" title="外部网站链接到你的网站">反向链接</td>      <td><a href='http://www.baidu.com/s?wd=domain%3Awww.baidu.com&cl=3' target=_blank title='1900000' rel=nofollow class=LN>1900000</a></td>      <td><a href='http://www.google.cn/search?hl=zh-CN&q=link%3Awww.baidu.com' target=_blank title='0' rel=nofollow class=LN>0</a></td>      <td><a href='http://sitemap.cn.yahoo.com/search?p=www.baidu.com&bwm=i' target=_blank title='5175919' rel=nofollow class=LN>5175919</a></td>      <td><a href='http://www.sogou.com/web?query=link%3Awww.baidu.com&num=10' target=_blank title='2939831' rel=nofollow class=LN>2939831</a></td>      <td><a href='http://www.soso.com/q?w=link%3Awww.baidu.com&sc=web&ch=w.ptl&lr=chs' target=_blank title='4130' rel=nofollow class=LN>4130</a></td>这里的数据，只是数值，其他属性可以不管求救

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

给你写点代码        WebRequest Wrq = WebRequest.Create("要抓的网站页面URL");
        WebResponse Wrs = Wrq.GetResponse();
        Stream strm = Wrs.GetResponseStream();
        StreamReader sr = new StreamReader(strm, System.Text.Encoding.GetEncoding("UTF-8"));
        string allstrm;
        allstrm = sr.ReadToEnd();
        string strPattern = @"要取的内容对应的正则";
        string result =String.Empty;
        MatchCollection Matches = Regex.Matches(allstrm, strPattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);
        foreach (Match NextMatch in Matches)
        {
            result = NextMatch.Groups[0].Value.ToString().Trim();

        }你所要做的就是把我汉字部分换成你需要的就行了
就是想抓取这个HTML代码里的一些数据。比如
<td width="7%"><a href='http://www.baidu.com/s?wd=site%3Awww.baidu.com&cl=3' target=_blank title='2140' rel=nofollow class=LN>2140</a></td>
我就是是想把2140给截取出来。
当然我是想截取出来  “收录情况”和“反向链接” 这2个<TR>里的所有数据
正则哦。
<a.*?title='(\d+)' rel=nofollow class=LN>(\1)</a>
Regex reg = new Regex(@"class=LN>.+?</a>");
            Match mat = reg.Match(html);
            while (mat.Successful)
            {
                Response.Write(Regex.Replace(mat.Value, @"[^\d]*", ""));
                mat = reg.Match(html, mat.Index + mat.Length);
            }
<a.*?title='(\d+)'>(\1)</a>
\<td[^\>]*\>收录情况\</td\>\s*
\<td[^\>]*\>\<a[^\>]*\>.*?(?<V1>\d+).*?\</a\>\</td\>\s*
\<td[^\>]*\>\<a[^\>]*\>.*?(?<V2>\d+).*?\</a\>\</td\>\s*
\<td[^\>]*\>\<a[^\>]*\>.*?(?<V3>\d+).*?\</a\>\</td\>\s*
\<td[^\>]*\>\<a[^\>]*\>.*?(?<V4>\d+).*?\</a\>\</td\>\s*
\<td[^\>]*\>\<a[^\>]*\>.*?(?<V5>\d+).*?\</a\>\</td\>\s*去掉换行, V1...V5就是要的数字
public string checkStr(string html)
        {
            System.Text.RegularExpressions.Regex regex1 = new System.Text.RegularExpressions.Regex(@"<script[sS]+</script *>", System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            System.Text.RegularExpressions.Regex regex2 = new System.Text.RegularExpressions.Regex(@" href *= *[sS]*script *:", System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            System.Text.RegularExpressions.Regex regex3 = new System.Text.RegularExpressions.Regex(@" no[sS]*=", System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            System.Text.RegularExpressions.Regex regex4 = new System.Text.RegularExpressions.Regex(@"<iframe[sS]+</iframe *>", System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            System.Text.RegularExpressions.Regex regex5 = new System.Text.RegularExpressions.Regex(@"<frameset[sS]+</frameset *>", System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            System.Text.RegularExpressions.Regex regex6 = new System.Text.RegularExpressions.Regex(@"<img[^>]+>", System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            System.Text.RegularExpressions.Regex regex7 = new System.Text.RegularExpressions.Regex(@"</p>", System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            System.Text.RegularExpressions.Regex regex8 = new System.Text.RegularExpressions.Regex(@"<p>", System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            System.Text.RegularExpressions.Regex regex9 = new System.Text.RegularExpressions.Regex(@"<[^>]*>", System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            html = regex1.Replace(html, ""); //过滤<script></script>标记
            html = regex2.Replace(html, ""); //过滤href=javascript: (<A>) 属性
            html = regex3.Replace(html, " _disibledevent="); //过滤其它控件的on...事件
            html = regex4.Replace(html, ""); //过滤iframe
            html = regex5.Replace(html, ""); //过滤frameset
            html = regex6.Replace(html, ""); //过滤frameset
            html = regex7.Replace(html, ""); //过滤frameset
            html = regex8.Replace(html, ""); //过滤frameset
            html = regex9.Replace(html, "");
            html = html.Replace(" ", "");
            html = html.Replace("</strong>", "");
            html = html.Replace("<strong>", "");
            return html;
        }
lz上面的正则其实差不多了，需要一些微调而已。所以建议lz稍微学习一点正则，然后使用正则的测试软件看一下结果就好了