<table width="100%" border="0" cellpadding="2" cellspacing="0" class="ft12r">
                                                                    <tbody>
                                                                        <tr>
                                                                            <td align="left" class="ft12h">
                                                                                <a id="pageLink" href="node_4.htm" class="ft12h">第01版:要闻</a>
                                                                            </td>
                                                                            <td nowrap align="right" width="16" class="ft12h">
                                                                                <a href="../../../page/1/2012-03-05/01/8441330887553262.pdf">
                                                                                    <img height="16" src="../../../tplimg/pdf.gif" width="16" border="0"></a>
                                                                            </td>
                                                                        </tr>
                                                                        <tr>
                                                                            <td align="left" bgcolor="#FEE7B1" class="ft12h">
                                                                                <a id="pageLink" href="node_5.htm" class="ft12h">第02版:要闻 </a>
                                                                            </td>
                                                                            <td width="16" align="middle" nowrap bgcolor="#FEE7B1">
                                                                                <a href="../../../page/1/2012-03-05/02/38211330879946762.pdf">
                                                                                    <img height="16" src="../../../tplimg/pdf.gif" width="16" border="0" /></a>
                                                                            </td>
                                                                        </tr>
                                                                      </tbody>
                                                                </table>
我想取到html里这个table的内容,怎么用正则取到,谢谢

解决方案 »

  1.   

    获取td内容?
                string str = File.ReadAllText(@"E:\1.txt", Encoding.GetEncoding("gb2312"));
                Regex reg = new Regex(@"(?is)<table[^>]*?class=""ft12r""[^>]*?>(?:.*?<td[^>]*?>(?:\s*<a[^>]*?>\s*)?(.*?)(?:\s*</a>\s*)?</td>)+.*?</table>");
                foreach (Match m in reg.Matches(str))
                    foreach (Capture c in m.Groups[1].Captures)
                        Console.WriteLine(c.Value);
    /*
    第01版:要闻
    <img height="16" src="../../../tplimg/pdf.gif" width="16" border="0">
    第02版:要闻
    <img height="16" src="../../../tplimg/pdf.gif" width="16" border="0" />
    */