现有以下2种tr:
1.
<tr xxx>
<td xxx>(需要的内容)</td>
<td xxx>(需要的内容)</td>
<td xxx>(需要的内容)</td>
...(共i个td,i为固定值)
</tr>
2.
<tr xxx>
<td xxx>(需要的内容)</td>
...前面共l个简单td,l固定
<td xxx>
<table>
<tr>
<td xxx>(需要的内容)</td>
</tr>
</table>
</td>
<td xxx>(需要的内容)</td>
...再m个简单td
<td xxx>
<table>
<tr>
<td xxx>(需要的内容)</td>
</tr>
</table>
</td>
<td xxx>(需要的内容)</td>
...n个简单td
</tr>
l+m+n+2=i
也就是说在同样i个td中,固定的2个位置各有一个包含table的td,table中都只有1个tr和1个td在同一个table中以上2种tr以随机的数量和顺序夹杂
请问我想取得其中所需部分该如何写正则表达式?现在利用
<tr[^>]*>((?:.*?(?=<table|</tr>)(?(<table)<table[^>]*>.*?</table>.*?|))*?)</tr>
可以取得一个tr之间的所有内容,但是要获取其中各td里的内容还需要再用一次正则,比较麻烦,有没有办法直接取得一个tr中的所有所需内容谢谢!

解决方案 »

  1.   

    没太明白你的意思,看看这是不是你想要的结果吧string yourStr = .........;
    MatchCollection mc = Regex.Matches(yourStr, @"<td[^>]*>(?<content>[^<>]*)</td>", RegexOptions.IgnoreCase);
    foreach (Match m in mc)
    {
        richTextBox2.Text += m.Groups["content"].Value + "\n";
    }
      

  2.   

    string s=@"<tr height=20 align=center>
                        <td width=3% align='center'>
                          &nbsp;&nbsp;
                          </td>
                        <td width=5% align='center'>a1</td>
                        <td width=20% align='left'>&nbsp;<a href='/index.asp' target=_blank>b1</a>
                            </td>
                        <td width=32% align='left'>
                          <a href='/default.aspx' target=_blank><font color=#ffffff>c1</font></a>
                          </td>
                        <td width=8% align='center'></td>
                        <td width=12% align='left'><a href='/index.htm' target=_blank>d1</a></td>
                        <td width=10% align='center'>e1</td>
                        <td width=6% align='center'>f1</td>
                        <td width=4% align='center'><a href='javascript:void();' >g1</a></td>
                        <form name='form1' method='post' 
    action='add.asp' target='_blank'><td width='4%' align='center'>
    h1</td></form>
                      </tr>
    <tr height=20 align=center>
                        <td width=3% align='center'>
                          &nbsp;&nbsp;
                          </td>
                        <td width=5% align='center'>a2</td>
                        <td width=20% align='left'>&nbsp;<a href='/index.asp' target=_blank>b2</a>
                            </td>
                        <td width=32% align='left'>
                          <a href='/default.aspx' target=_blank><font color=#ffffff>c2</font></a>
                          </td>
                        <td width=8% align='center'></td>
                        <td width=12% align='left'><a href='/index.htm' target=_blank>d2</a></td>
                        <td width=10% align='center'>e2</td>
                        <td width=6% align='center'>f2</td>
                        <td width=4% align='center'><a href='javascript:void();' >g2</a></td>
                        <form name='form1' method='post' 
    action='add.asp' target='_blank'><td width='4%' align='center'>
    h2</td></form>
                      </tr>
    <tr height=20 align=center>
                        <td width=3% align='center'>
                          &nbsp;&nbsp;
                          </td>
                        <td width=5% align='center'>a3</td>
                        <td width=20% align='left'>&nbsp;<a href='/index.asp' target=_blank>b3</a>
                            </td>
                        <td width=32% align='left'>
                          <a href='/default.aspx' target=_blank><font color=#ffffff>c3</font></a>
                          </td>
                        <td width=8% align='center'></td>
                        <td width=12% align='left'><a href='/index.htm' target=_blank>d3</a></td>
                        <td width=10% align='center'>e3</td>
                        <td width=6% align='center'>f3</td>
                        <td width=4% align='center'><a href='javascript:void();' >g3</a></td>
                        <form name='form1' method='post' 
    action='add.asp' target='_blank'><td width='4%' align='center'>
    h3</td></form>
                      </tr>";
    string p=@"<tr[^>]*?>(?:(?!<td).*?<td[^>]*?>((?!</td>).*?)</td>){9}(?!</tr>).*?</tr>";
    Regex r=new Regex(p,RegexOptions.Singleline | RegexOptions.IgnoreCase);
    Match m=r.Match(s);
    while(m.Success)
    {
    foreach(Capture c in m.Groups[1].Captures)
    {
    Console.WriteLine(c.Value);
    }
    m=m.NextMatch();
    }==========================================================
    如上,每次match可以获得a1-h1所有值(事实上应该是10个,但这句有点问题,当中一个空的没能匹配,就先改成{9}了,先发上来再说)
    但是如果其中一个,比如
                          <td width=5% align='center'>a3</td>
    改成
                          <td width=5% align='center'>
                            <a href='default.asp'>a3</font></a>
                            <div id="Layer3" style="width:25px">
                              <table width="100%" border="0" bgcolor=#ffffff style="border:1px solid black;">
                                <tr>
                                  <td class=c1>i3(忘了,多个table时这个也要)<br></td>
                                </tr>
                              </table>
                            </div>
                          </td>
    的话,这样就不对了,但是我仍然希望能够像上面一样,一次match把a1-h1(可能还有i1,j1)都得到
    (实际遇到的是其中4、5两个td有可能变成如上形式,要变2个一起变)
    不知道这样是否够清楚了
      

  3.   

    string yourStr = .........;
    MatchCollection mc = Regex.Matches(yourStr, @"<td[^>]*>(?<content>[\s\S]*?)</td>", RegexOptions.IgnoreCase);
    foreach (Match m in mc)
    {
        richTextBox2.Text += m.Groups["content"].Value + "\n";
    }就你楼上的例子,这样取出来的是30条数据,看看是不是你想的结果,如果不是,只说你要的结果就行了
      

  4.   

    是的,30个没错,但是后面提到的却没解决如下:需要得到的是
    a1-h1 加一个空格
    a2-k2
    a3-k3
    你提供的表达式在获取c2 c3 j2 j3 i2 i3 k2 k3时不正确
    string s=@"<tr height=20 align=center>
                        <td width=3% align='center'>
                          &nbsp;&nbsp;
                          </td>
                        <td width=5% align='center'>a1</td>
                        <td width=20% align='left'>&nbsp;<a href='/index.asp' target=_blank>b1</a>
                            </td>
                        <td width=32% align='left'>
                          <a href='/default.aspx' target=_blank><font color=#ffffff>c1</font></a>
                          </td>
                        <td width=8% align='center'></td>
                        <td width=12% align='left'><a href='/index.htm' target=_blank>d1</a></td>
                        <td width=10% align='center'>e1</td>
                        <td width=6% align='center'>f1</td>
                        <td width=4% align='center'><a href='javascript:void();' >g1</a></td>
                        <form name='form1' method='post' 
    action='add.asp' target='_blank'><td width='4%' align='center'>
    h1</td></form>
                      </tr>
    <tr height=20 align=center>
                        <td width=3% align='center'>
                          &nbsp;&nbsp;
                          </td>
                        <td width=5% align='center'>a2</td>
                        <td width=20% align='left'>&nbsp;<a href='/index.asp' target=_blank>b2</a>
                            </td>
                        <td width=32% align='left'>
                         <td width=5% align='center'>
                            <a href='default.asp'>c2</font></a>
                            <div id='Layer3' style='width:25px'>
                              <table width='100%' border='0' bgcolor=#ffffff style='border:1px solid black;'>
                                <tr>
                                  <td class=c1>i2(忘了,多个table时这个也要)<br/></td>
                                </tr>
                              </table>
                            </div>
                          </td>
                         <td width=5% align='center'>
                            <a href='default.asp'>j2</font></a>
                            <div id='Layer3' style='width:25px'>
                              <table width='100%' border='0' bgcolor=#ffffff style='border:1px solid black;'>
                                <tr>
                                  <td class=c1>k2(忘了,多个table时这个也要)<br/></td>
                                </tr>
                              </table>
                            </div>
                          </td>
                        <td width=12% align='left'><a href='/index.htm' target=_blank>d2</a></td>
                        <td width=10% align='center'>e2</td>
                        <td width=6% align='center'>f2</td>
                        <td width=4% align='center'><a href='javascript:void();' >g2</a></td>
                        <form name='form1' method='post' 
    action='add.asp' target='_blank'><td width='4%' align='center'>
    h2</td></form>
                      </tr>
    <tr height=20 align=center>
                        <td width=3% align='center'>
                          &nbsp;&nbsp;
                          </td>
                        <td width=5% align='center'>a3</td>
                        <td width=20% align='left'>&nbsp;<a href='/index.asp' target=_blank>b3</a>
                            </td>
                        <td width=5% align='center'>
                            <a href='default.asp'>c3</font></a>
                            <div id='Layer3' style='width:25px'>
                              <table width='100%' border='0' bgcolor=#ffffff style='border:1px solid black;'>
                                <tr>
                                  <td class=c1>i3(忘了,多个table时这个也要)<br/></td>
                                </tr>
                              </table>
                            </div>
                          </td>
                         <td width=5% align='center'>
                            <a href='default.asp'>j3</font></a>
                            <div id='Layer3' style='width:25px'>
                              <table width='100%' border='0' bgcolor=#ffffff style='border:1px solid black;'>
                                <tr>
                                  <td class=c1>k3(忘了,多个table时这个也要)<br/></td>
                                </tr>
                              </table>
                            </div>
                          </td>
                        <td width=12% align='left'><a href='/index.htm' target=_blank>d3</a></td>
                        <td width=10% align='center'>e3</td>
                        <td width=6% align='center'>f3</td>
                        <td width=4% align='center'><a href='javascript:void();' >g3</a></td>
                        <form name='form1' method='post' 
    action='add.asp' target='_blank'><td width='4%' align='center'>
    h3</td></form>
                      </tr>";
    另外补充一下,这段正则是一整段表达式的一部分,其他部分是为了把这段分离出来,因此只可能匹配一次,所以最理想是一次match能捕获全部内容,不知道可不可能...
      

  5.   

    string result = "";
    MatchCollection mc = Regex.Matches(yourStr, @"<td[^>]*>(?<content>[\s\S]*?)</td>", RegexOptions.IgnoreCase);
    foreach (Match m in mc)
    {
        result = Regex.Replace(m.Groups["content"].Value, @"<[^>]*>", "");
        result = result.Replace("&nbsp;", "");
        result = result.Trim();
        richTextBox2.Text += result + "\n";
    }
      

  6.   

    很接近了,不过还是有些问题
    看来想要一次match就解决恐怕比较麻烦,即使搞出来了表达式也太复杂,就乖乖的多做几个循环吧感谢lxcnn(过客) 的帮助