web harvest 100分

100分求大神写个例子用web harvest爬取网页拿http://www.fenzhi.com/xsc1p1.html为例
一下是我写的脚本
<?xml version="1.0" encoding="UTF-8"?>
<config>
<include path="functions.xml" />
<file action="write" path="fz/2.xml" charset="UTF-8">
<![CDATA[<catalog>]]>
<empty>
<var-def name="priceList" id="priceList">
<xpath expression="//div[@class='winnerLink']">
<html-to-xml>
<http url="http://www.fenzhi.com/xsc1p1.html" />
</html-to-xml>
</xpath>
</var-def>
</empty>
<loop item="item" index="i">
<list>
<var name="priceList"></var>
</list>
<body>
<xquery>
<xq-param name="item" type="node()">
<var name="item" />
</xq-param>
<xq-expression>
<![CDATA[
declare variable $item as node() external;
let $name :=data($item)
                                 return
                                  <info name='{normalize-space($name)}'>
                                  <name>{normalize-space($name)}</name>
                                  </info>
                                     ]]>
</xq-expression>
</xquery>
</body>
</loop>
<![CDATA[</catalog>]]>
</file>
</config>
我想要的结果是在http://www.fenzhi.com/xsc1p1.html页面爬取到华为 ibm这样的名字存起来然后再爬取到超链接比如gsx3131.html 然后用这个超链接和www.fenzhi.com拼接到一起成新的url 进入到这个url后爬取公司简介
目前比较困惑的是怎么在同一个loop里去循环2个结果集  总是合在一起  哎

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

java代码public static void main(String[] args) {
ScraperConfiguration config;
long startTime = 0L;
try {

config = new ScraperConfiguration(
"H:\\workspace\\nutch\\src\\com\\jsq\\nutch\\jianjie.xml");
Scraper scraper = new Scraper(config, "E:\\tmp");// 指定工作目录，爬去后的xml会保存到这里
scraper.setDebug(true);
scraper.execute();
startTime = System.currentTimeMillis();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

}