我已经抓取了某个页面,已经把内容局限在以下的这个表里,请问2个问题:
1)如何抓取每对的"英文名称"和"中文名称"
2)如何抓取PAGE数(page="xx")
<TABLE WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="5">
                  <TR>
                    <TD VALIGN="top"><TABLE WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="5">
                        
                                                
                        <TR>
                          <TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
                          <TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">Acid c<font color=red>ya</font>nine</span> 
                            
                          </TD>
                        </TR>
                        <TR>
                          <TD>&nbsp;</TD>
                          <TD> 中文名称:酸性花青[染料] </TD>
                        </TR>
                        <TR>
                          <TD>&nbsp;</TD>
                          <TD>&nbsp;</TD>
                        </TR>
                        
                        <TR>
                          <TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
                          <TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">Actual count(=actual <font color=red>ya</font>rn count)</span> 
                            
                          </TD>
                        </TR>
                        <TR>
                          <TD>&nbsp;</TD>
                          <TD> 中文名称:实际支数,实际纱支 </TD>
                        </TR>
                        <TR>
                          <TD>&nbsp;</TD>
                          <TD>&nbsp;</TD>
                        </TR>
                        
                        <TR>
                          <TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
                          <TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">actual <font color=red>ya</font>rn count</span> 
                            
                          </TD>
                        </TR>
                        <TR>
                          <TD>&nbsp;</TD>
                          <TD> 中文名称:实际纱支  </TD>
                        </TR>
                        <TR>
                          <TD>&nbsp;</TD>
                          <TD>&nbsp;</TD>
                        </TR>
                        
                        <TR>
                          <TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
                          <TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">Aerated <font color=red>ya</font>rn</span> 
                            
                          </TD>
                        </TR>
                        <TR>
                          <TD>&nbsp;</TD>
                          <TD> 中文名称:气泡纱[由含有气泡的纤维构成] </TD>
                        </TR>
                        <TR>
                          <TD>&nbsp;</TD>
                          <TD>&nbsp;</TD>
                        </TR>
                        
                        <TR>
                          <TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
                          <TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">agfa pol<font color=red>ya</font>mide</span> 
                            
                          </TD>
                        </TR>
                        <TR>
                          <TD>&nbsp;</TD>
                          <TD> 中文名称:阿克发聚酰胺  </TD>
                        </TR>
                        <TR>
                          <TD>&nbsp;</TD>
                          <TD>&nbsp;</TD>
                        </TR>
                        
                        <TR>
                          <TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
                          <TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">air entangled <font color=red>ya</font>rn</span> 
                            
                          </TD>
                        </TR>
                        <TR>
                          <TD>&nbsp;</TD>
                          <TD> 中文名称:气流喷射交缠丝《化纤》  </TD>
                        </TR>
                        <TR>
                          <TD>&nbsp;</TD>
                          <TD>&nbsp;</TD>
                        </TR>
                        
                        <TR>
                          <TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
                          <TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">Air jet bulky <font color=red>ya</font>rn</span> 
                            
                          </TD>
                        </TR>
                        <TR>
                          <TD>&nbsp;</TD>
                          <TD> 中文名称:喷气(法)膨松变形丝 </TD>
                        </TR>
                        <TR>
                          <TD>&nbsp;</TD>
                          <TD>&nbsp;</TD>
                        </TR>
                        
                        <TR>
                          <TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
                          <TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">air-bulked <font color=red>ya</font>rn</span> 
                            
                          </TD>
                        </TR>
                        <TR>
                          <TD>&nbsp;</TD>
                          <TD> 中文名称:喷气膨化纱  </TD>
                        </TR>
                        <TR>
                          <TD>&nbsp;</TD>
                          <TD>&nbsp;</TD>
                        </TR>
                        
                        <TR>
                          <TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
                          <TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">air-buo<font color=red>ya</font>ncy force</span> 
                            
                          </TD>
                        </TR>
                        <TR>
                          <TD>&nbsp;</TD>
                          <TD> 中文名称:空气浮力  </TD>
                        </TR>
                        <TR>
                          <TD>&nbsp;</TD>
                          <TD>&nbsp;</TD>
                        </TR>
                        
                      </TABLE>
                        <center>
[上一页] <font color=red >[1]</font> <a href='/dictionary/result.asp?page=2&Blur=1&Keyword=ya' >[2]</a> <a href='/dictionary/result.asp?page=3&Blur=1&Keyword=ya' >[3]</a> <a href='/dictionary/result.asp?page=4&Blur=1&Keyword=ya' >[4]</a> <a href='/dictionary/result.asp?page=5&Blur=1&Keyword=ya' >[5]</a> <a href='/dictionary/result.asp?page=6&Blur=1&Keyword=ya' >[6]</a> <a href='/dictionary/result.asp?page=7&Blur=1&Keyword=ya' >[7]</a> <a href='/dictionary/result.asp?page=8&Blur=1&Keyword=ya' >[8]</a> <a href='/dictionary/result.asp?page=9&Blur=1&Keyword=ya' >[9]</a> <a href='/dictionary/result.asp?page=10&Blur=1&Keyword=ya' >[10]</a> <a href='/dictionary/result.asp?page=2&Blur=1&Keyword=ya' >[下一页]</a> </center>
                    </TD>
                  </TR>
                </TABLE>

解决方案 »

  1.   

    http://msdn2.microsoft.com/EN-US/library/aa704078.aspx
      

  2.   

    VS 2003 下调试结果:英文名称:Acidcyanine中文名称:酸性花青[染料]
    英文名称:Actualcount(=actualyarncount)中文名称:实际支数,实际纱支
    英文名称:actualyarncount中文名称:实际纱支
    英文名称:Aeratedyarn中文名称:气泡纱[由含有气泡的纤维构成]
    英文名称:agfapolyamide中文名称:阿克发聚酰胺
    英文名称:airentangledyarn中文名称:气流喷射交缠丝《化纤》
    英文名称:Airjetbulkyyarn中文名称:喷气(法)膨松变形丝
    英文名称:air-bulkedyarn中文名称:喷气膨化纱
    英文名称:air-buoyancyforce中文名称:空气浮力
    ------------------------------------------------------------------------------
    一个Form1    一个button1    一个 textBox1 private void button1_Click(object sender, System.EventArgs e)
    {
    StreamReader f1=new StreamReader("1.txt",Encoding.Default);
    string str1=f1.ReadToEnd();
    this.textBox1.Text=str1; MatchCollection str2=Regex.Matches(str1,"<TR>.*?英文名称.*?</TR>.*?<TR>.*?中文名称.*?</TR>",RegexOptions.Singleline);
    MessageBox.Show("找到"+str2.Count.ToString()+"匹配项");
    this.textBox1.Text="";
    foreach (Match str in str2)
    {
    string str3=Regex.Replace(str.ToString(),"(\\s)|(<.*?>)|(&nbsp;)","",RegexOptions.Singleline);
    this.textBox1.Text+=str3+"\r\n";
    }
    }
      

  3.   

    抓取PAGE数(page="xx")    前面的都可以抓了  这个应好弄了.
      

  4.   

    看错
    IE解析器在asp.net下面不支持
      

  5.   

    这个的正则要怎样写:
    <a href='/dictionary/result.asp?page=正整数&Blur=1&Keyword=任意字符' >[正整数]</a>
    正整数和任意字符要怎样表达?
      

  6.   

    ·字符集的简写因为一些字符集非常常用,所以有一些简写方式。<<\d>>代表<<[0-9]>>;<<\w>>代表单词字符。这个是随正则表达式实现的不同而有些差异。绝大多数的正则表达式实现的单词字符集都包含了<<A-Za-z0-9_]>>。<<\s>>代表“白字符”。这个也是和不同的实现有关的。在绝大多数的实现中,都包含了空格符和Tab符,以及回车换行符<<\r\n>>。字符集的缩写形式可以用在方括号之内或之外。<<\s\d>>匹配一个白字符后面紧跟一个数字。<<[\s\d]>>匹配单个白字符或数字。<<[\da-fA-F]>>将匹配一个十六进制数字。·字符集的重复如果你用“?*+”操作符来重复一个字符集,你将会重复整个字符集。而不仅是它匹配的那个字符。正则表达式<<[0-9]+>>会匹配837以及222。如果你仅仅想重复被匹配的那个字符,可以用向后引用达到目的。我们以后将讲到向后引用。
      

  7.   

    这个的正则要怎样写:
    <a href='/dictionary/result.asp?page=正整数&Blur=1&Keyword=任意字符' >[正整数]</a>
    正整数和任意字符要怎样表达?
    <a href='/dictionary/result.asp?page=(\d)+?&Blur=1&Keyword=.*?' >[(\d)+?]</a>
    使用时注意转义符