我已经抓取了某个页面,已经把内容局限在以下的这个表里,请问2个问题:
1)如何抓取每对的"英文名称"和"中文名称"
2)如何抓取PAGE数(page="xx")
<TABLE WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="5">
<TR>
<TD VALIGN="top"><TABLE WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="5">
<TR>
<TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
<TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">Acid c<font color=red>ya</font>nine</span>
</TD>
</TR>
<TR>
<TD> </TD>
<TD> 中文名称:酸性花青[染料] </TD>
</TR>
<TR>
<TD> </TD>
<TD> </TD>
</TR>
<TR>
<TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
<TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">Actual count(=actual <font color=red>ya</font>rn count)</span>
</TD>
</TR>
<TR>
<TD> </TD>
<TD> 中文名称:实际支数,实际纱支 </TD>
</TR>
<TR>
<TD> </TD>
<TD> </TD>
</TR>
<TR>
<TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
<TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">actual <font color=red>ya</font>rn count</span>
</TD>
</TR>
<TR>
<TD> </TD>
<TD> 中文名称:实际纱支 </TD>
</TR>
<TR>
<TD> </TD>
<TD> </TD>
</TR>
<TR>
<TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
<TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">Aerated <font color=red>ya</font>rn</span>
</TD>
</TR>
<TR>
<TD> </TD>
<TD> 中文名称:气泡纱[由含有气泡的纤维构成] </TD>
</TR>
<TR>
<TD> </TD>
<TD> </TD>
</TR>
<TR>
<TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
<TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">agfa pol<font color=red>ya</font>mide</span>
</TD>
</TR>
<TR>
<TD> </TD>
<TD> 中文名称:阿克发聚酰胺 </TD>
</TR>
<TR>
<TD> </TD>
<TD> </TD>
</TR>
<TR>
<TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
<TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">air entangled <font color=red>ya</font>rn</span>
</TD>
</TR>
<TR>
<TD> </TD>
<TD> 中文名称:气流喷射交缠丝《化纤》 </TD>
</TR>
<TR>
<TD> </TD>
<TD> </TD>
</TR>
<TR>
<TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
<TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">Air jet bulky <font color=red>ya</font>rn</span>
</TD>
</TR>
<TR>
<TD> </TD>
<TD> 中文名称:喷气(法)膨松变形丝 </TD>
</TR>
<TR>
<TD> </TD>
<TD> </TD>
</TR>
<TR>
<TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
<TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">air-bulked <font color=red>ya</font>rn</span>
</TD>
</TR>
<TR>
<TD> </TD>
<TD> 中文名称:喷气膨化纱 </TD>
</TR>
<TR>
<TD> </TD>
<TD> </TD>
</TR>
<TR>
<TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
<TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">air-buo<font color=red>ya</font>ncy force</span>
</TD>
</TR>
<TR>
<TD> </TD>
<TD> 中文名称:空气浮力 </TD>
</TR>
<TR>
<TD> </TD>
<TD> </TD>
</TR>
</TABLE>
<center>
[上一页] <font color=red >[1]</font> <a href='/dictionary/result.asp?page=2&Blur=1&Keyword=ya' >[2]</a> <a href='/dictionary/result.asp?page=3&Blur=1&Keyword=ya' >[3]</a> <a href='/dictionary/result.asp?page=4&Blur=1&Keyword=ya' >[4]</a> <a href='/dictionary/result.asp?page=5&Blur=1&Keyword=ya' >[5]</a> <a href='/dictionary/result.asp?page=6&Blur=1&Keyword=ya' >[6]</a> <a href='/dictionary/result.asp?page=7&Blur=1&Keyword=ya' >[7]</a> <a href='/dictionary/result.asp?page=8&Blur=1&Keyword=ya' >[8]</a> <a href='/dictionary/result.asp?page=9&Blur=1&Keyword=ya' >[9]</a> <a href='/dictionary/result.asp?page=10&Blur=1&Keyword=ya' >[10]</a> <a href='/dictionary/result.asp?page=2&Blur=1&Keyword=ya' >[下一页]</a> </center>
</TD>
</TR>
</TABLE>
1)如何抓取每对的"英文名称"和"中文名称"
2)如何抓取PAGE数(page="xx")
<TABLE WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="5">
<TR>
<TD VALIGN="top"><TABLE WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="5">
<TR>
<TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
<TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">Acid c<font color=red>ya</font>nine</span>
</TD>
</TR>
<TR>
<TD> </TD>
<TD> 中文名称:酸性花青[染料] </TD>
</TR>
<TR>
<TD> </TD>
<TD> </TD>
</TR>
<TR>
<TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
<TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">Actual count(=actual <font color=red>ya</font>rn count)</span>
</TD>
</TR>
<TR>
<TD> </TD>
<TD> 中文名称:实际支数,实际纱支 </TD>
</TR>
<TR>
<TD> </TD>
<TD> </TD>
</TR>
<TR>
<TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
<TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">actual <font color=red>ya</font>rn count</span>
</TD>
</TR>
<TR>
<TD> </TD>
<TD> 中文名称:实际纱支 </TD>
</TR>
<TR>
<TD> </TD>
<TD> </TD>
</TR>
<TR>
<TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
<TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">Aerated <font color=red>ya</font>rn</span>
</TD>
</TR>
<TR>
<TD> </TD>
<TD> 中文名称:气泡纱[由含有气泡的纤维构成] </TD>
</TR>
<TR>
<TD> </TD>
<TD> </TD>
</TR>
<TR>
<TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
<TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">agfa pol<font color=red>ya</font>mide</span>
</TD>
</TR>
<TR>
<TD> </TD>
<TD> 中文名称:阿克发聚酰胺 </TD>
</TR>
<TR>
<TD> </TD>
<TD> </TD>
</TR>
<TR>
<TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
<TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">air entangled <font color=red>ya</font>rn</span>
</TD>
</TR>
<TR>
<TD> </TD>
<TD> 中文名称:气流喷射交缠丝《化纤》 </TD>
</TR>
<TR>
<TD> </TD>
<TD> </TD>
</TR>
<TR>
<TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
<TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">Air jet bulky <font color=red>ya</font>rn</span>
</TD>
</TR>
<TR>
<TD> </TD>
<TD> 中文名称:喷气(法)膨松变形丝 </TD>
</TR>
<TR>
<TD> </TD>
<TD> </TD>
</TR>
<TR>
<TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
<TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">air-bulked <font color=red>ya</font>rn</span>
</TD>
</TR>
<TR>
<TD> </TD>
<TD> 中文名称:喷气膨化纱 </TD>
</TR>
<TR>
<TD> </TD>
<TD> </TD>
</TR>
<TR>
<TD WIDTH="4%"><IMG SRC="image/retail.gif" WIDTH="18" HEIGHT="14"> </TD>
<TD WIDTH="96%"> 英文名称:<span style="font-family:Verdana, Arial, Helvetica, sans-serif">air-buo<font color=red>ya</font>ncy force</span>
</TD>
</TR>
<TR>
<TD> </TD>
<TD> 中文名称:空气浮力 </TD>
</TR>
<TR>
<TD> </TD>
<TD> </TD>
</TR>
</TABLE>
<center>
[上一页] <font color=red >[1]</font> <a href='/dictionary/result.asp?page=2&Blur=1&Keyword=ya' >[2]</a> <a href='/dictionary/result.asp?page=3&Blur=1&Keyword=ya' >[3]</a> <a href='/dictionary/result.asp?page=4&Blur=1&Keyword=ya' >[4]</a> <a href='/dictionary/result.asp?page=5&Blur=1&Keyword=ya' >[5]</a> <a href='/dictionary/result.asp?page=6&Blur=1&Keyword=ya' >[6]</a> <a href='/dictionary/result.asp?page=7&Blur=1&Keyword=ya' >[7]</a> <a href='/dictionary/result.asp?page=8&Blur=1&Keyword=ya' >[8]</a> <a href='/dictionary/result.asp?page=9&Blur=1&Keyword=ya' >[9]</a> <a href='/dictionary/result.asp?page=10&Blur=1&Keyword=ya' >[10]</a> <a href='/dictionary/result.asp?page=2&Blur=1&Keyword=ya' >[下一页]</a> </center>
</TD>
</TR>
</TABLE>
英文名称:Actualcount(=actualyarncount)中文名称:实际支数,实际纱支
英文名称:actualyarncount中文名称:实际纱支
英文名称:Aeratedyarn中文名称:气泡纱[由含有气泡的纤维构成]
英文名称:agfapolyamide中文名称:阿克发聚酰胺
英文名称:airentangledyarn中文名称:气流喷射交缠丝《化纤》
英文名称:Airjetbulkyyarn中文名称:喷气(法)膨松变形丝
英文名称:air-bulkedyarn中文名称:喷气膨化纱
英文名称:air-buoyancyforce中文名称:空气浮力
------------------------------------------------------------------------------
一个Form1 一个button1 一个 textBox1 private void button1_Click(object sender, System.EventArgs e)
{
StreamReader f1=new StreamReader("1.txt",Encoding.Default);
string str1=f1.ReadToEnd();
this.textBox1.Text=str1; MatchCollection str2=Regex.Matches(str1,"<TR>.*?英文名称.*?</TR>.*?<TR>.*?中文名称.*?</TR>",RegexOptions.Singleline);
MessageBox.Show("找到"+str2.Count.ToString()+"匹配项");
this.textBox1.Text="";
foreach (Match str in str2)
{
string str3=Regex.Replace(str.ToString(),"(\\s)|(<.*?>)|( )","",RegexOptions.Singleline);
this.textBox1.Text+=str3+"\r\n";
}
}
IE解析器在asp.net下面不支持
<a href='/dictionary/result.asp?page=正整数&Blur=1&Keyword=任意字符' >[正整数]</a>
正整数和任意字符要怎样表达?
<a href='/dictionary/result.asp?page=正整数&Blur=1&Keyword=任意字符' >[正整数]</a>
正整数和任意字符要怎样表达?
<a href='/dictionary/result.asp?page=(\d)+?&Blur=1&Keyword=.*?' >[(\d)+?]</a>
使用时注意转义符