我现在有一点html代码, <tr bgcolor="#f1f5fa">
<td nowrap class="smalltext">May 2010</td>
<td class="smalltext">United Kingdom</td>
<td class="smalltext">
<a href="/WebObjects/iTunesConnect.woa/wo/3.0.0.5.1.7.1.1.2.15.0.5.1">
80075911_0510_GB_PYMT.txt
</a>
</td>
</tr>
<tr bgcolor="#f1f5fa">
<td nowrap class="smalltext">May 2010</td>
<td class="smalltext">United Kingdom</td>
<td class="smalltext">
<a href="/WebObjects/iTunesConnect.woa/wo/3.0.0.5.1.7.1.1.2.15.1.5.1">
80075911_0510_GB.txt
</a>
</td>
</tr>我要用“May 2010”这个关键词分别抓里面<a href="/>超链接地址。谁能帮我看看,谢了!
<td nowrap class="smalltext">May 2010</td>
<td class="smalltext">United Kingdom</td>
<td class="smalltext">
<a href="/WebObjects/iTunesConnect.woa/wo/3.0.0.5.1.7.1.1.2.15.0.5.1">
80075911_0510_GB_PYMT.txt
</a>
</td>
</tr>
<tr bgcolor="#f1f5fa">
<td nowrap class="smalltext">May 2010</td>
<td class="smalltext">United Kingdom</td>
<td class="smalltext">
<a href="/WebObjects/iTunesConnect.woa/wo/3.0.0.5.1.7.1.1.2.15.1.5.1">
80075911_0510_GB.txt
</a>
</td>
</tr>我要用“May 2010”这个关键词分别抓里面<a href="/>超链接地址。谁能帮我看看,谢了!
String ss = "<table>
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
</tr>
</table>" myParser = Parser.createParser(ss, "GBK");
NodeFilter tableFilter = new NodeClassFilter(TableTag.class);
OrFilter lastFilter = new OrFilter();
lastFilter.setPredicates(new NodeFilter[] { tableFilter });
try
{
nodeList = myParser.parse(lastFilter);
for (int i = 0; i <= nodeList.size(); i++)
{
if (nodeList.elementAt(i) instanceof TableTag)
{
TableTag tag = (TableTag) nodeList.elementAt(i);
TableRow[] rows = tag.getRows();
for (int j = 1; j < rows.length; j++)
{
TableRow tr = (TableRow) rows[j];
TableColumn[] td = tr.getColumns();
System.out.println(td.length);
for (int k = 0; k < td.length; k++)
{
System.out.println(td[k].toPlainTextString());
}
}
}
}
} catch (ParserException e)
{
e.printStackTrace();
}
}
由你这个现有的资料, 目前只能过滤td,获取td的文本和你要的进行比较, 如果满足条件就获得父节点,
这时有两种情况:
1.如果你含有<a>的所有tr就是一行三列的话, 直接拿父节点getChild(index)获得节点,然后转成LinkTag获取href属性.!
2.如果列不固定,那么你就要遍历了.! 遍历中去判断!
<td nowrap class="smalltext">May 2010</td>
<td class="smalltext">United Kingdom</td>
<td class="smalltext">
<a href="/WebObjects/iTunesConnect.woa/wo/3.0.0.5.1.7.1.1.2.15.0.5.1">
80075911_0510_GB_PYMT.txt
</a>
</td>
</tr>
这个样子的,我就是现在知道是May 2010 把它里面的<a> url拿到。照上面那哥们说的 要先匹配出这个table 然后去匹配tr 那方法,我试了试,也很难能拿到url链接