这有一个网页,我想提取网页内的指定内容,就是单位名称,正则表达式怎么写,它们是有规律的,它们是一个一个的表列出的,像这样的信息,一个网页很多请高手指点,
我写了一个不行 </td>\w+</td>
<table width="100%" border="0" cellspacing="0" cellpadding="3" align="center" id="table108">
<tr>
<td width="4%" align="center" class="uptitle">
<img src="../images/filenew.gif" width="14" height="16"></td><td valign="bottom" align="left" class="uptitle">
大连市环境保护局</td>
</tr>
</table>
<table width="100%" border="0" cellspacing="0" cellpadding="0" align="center" id="table109">
<tr>
<td></td>
</tr>
</table>
<table width="100%" border="0" cellspacing="1" cellpadding="3" align="center" bgcolor="#bbbbbb" id="table110">
<tr>
<td bgcolor="#FFFfFf" align="left" height="30">
邮 编:116001<br>
地 址:大连市中山区华乐街1号<br>
电 话:0411-82738099 82739099<br>
传 真:0411-82738181<br>
网 址:www.dlepb.gov.cn</td>
</tr>
</table>
<table width="100%" border="0" cellspacing="0" cellpadding="3" align="center" id="table114">
<tr>
<td width="4%" align="center" class="uptitle">
<img src="../images/filenew.gif" width="14" height="16"></td><td valign="bottom" align="left" class="uptitle">
大连市环境保护局西岗分局</td>
</tr>
</table>
<table width="100%" border="0" cellspacing="0" cellpadding="0" align="center" id="table115">
<tr>
<td></td>
</tr>
</table>
我写了一个不行 </td>\w+</td>
<table width="100%" border="0" cellspacing="0" cellpadding="3" align="center" id="table108">
<tr>
<td width="4%" align="center" class="uptitle">
<img src="../images/filenew.gif" width="14" height="16"></td><td valign="bottom" align="left" class="uptitle">
大连市环境保护局</td>
</tr>
</table>
<table width="100%" border="0" cellspacing="0" cellpadding="0" align="center" id="table109">
<tr>
<td></td>
</tr>
</table>
<table width="100%" border="0" cellspacing="1" cellpadding="3" align="center" bgcolor="#bbbbbb" id="table110">
<tr>
<td bgcolor="#FFFfFf" align="left" height="30">
邮 编:116001<br>
地 址:大连市中山区华乐街1号<br>
电 话:0411-82738099 82739099<br>
传 真:0411-82738181<br>
网 址:www.dlepb.gov.cn</td>
</tr>
</table>
<table width="100%" border="0" cellspacing="0" cellpadding="3" align="center" id="table114">
<tr>
<td width="4%" align="center" class="uptitle">
<img src="../images/filenew.gif" width="14" height="16"></td><td valign="bottom" align="left" class="uptitle">
大连市环境保护局西岗分局</td>
</tr>
</table>
<table width="100%" border="0" cellspacing="0" cellpadding="0" align="center" id="table115">
<tr>
<td></td>
</tr>
</table>
(?<=<td\s+.*?>)[\s\S]*?(?=</td>)
<table width="100%" border="0" cellspacing="0" cellpadding="3" align="center" id="table108">
<tr>
<td width="4%" align="center" class="uptitle">
<img src="../images/filenew.gif" width="14" height="16"></td><td valign="bottom" align="left" class="uptitle">
大连市环境保护局</td>
</tr>
</table>
<table width="100%" border="0" cellspacing="0" cellpadding="0" align="center" id="table109">
<tr>
<td></td>
</tr>
</table>
<table width="100%" border="0" cellspacing="1" cellpadding="3" align="center" bgcolor="#bbbbbb" id="table110">
<tr>
<td bgcolor="#FFFfFf" align="left" height="30">
邮 编:116001<br>
地 址:大连市中山区华乐街1号<br>
电 话:0411-82738099 82739099<br>
传 真:0411-82738181<br>
网 址:www.dlepb.gov.cn</td>
</tr>
</table>
<table width="100%" border="0" cellspacing="0" cellpadding="3" align="center" id="table114">
<tr>
<td width="4%" align="center" class="uptitle">
<img src="../images/filenew.gif" width="14" height="16"></td><td valign="bottom" align="left" class="uptitle">
大连市环境保护局西岗分局</td>
</tr>
</table>
<table width="100%" border="0" cellspacing="0" cellpadding="0" align="center" id="table115">
<tr>
<td></td>
</tr>
</table>
参考如下代码
private void button1_Click(object sender, EventArgs e)
{
string s = textBox1.Text;
foreach (Match vMatch in Regex.Matches(s,
@"<table[^>]*>\s*<tr>\s*<td[^>]*>\s*<img[^>]*>\s*" +
@"</td>\s*<td[^>]*>\s*(?<Company>\w+)\s*</td>\s*</tr>\s*</table>"))
{
Console.WriteLine(vMatch.Result("${Company}"));
}
}输出结果
[code=BatchFile]大连市环境保护局
大连市环境保护局西岗分局[/code]