PHP采集代码

1 <? 建议使用 <?php 2 Function 建议使用 function其它的,建议你把出错信息贴出来...

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

html的解析部分，
不要用固定的标签字串，
解析HTML标签，这样更准确。
别的不说了，
lz看看解析HTML的函数库吧，这里是采集的关键。
http://php-html.sourceforge.net/
把你的代码用csdn自带的格式帖出来．
看上去很乱．另外，哪里出错，有什么错误提示．
想学采集的话。先学正则，再学curl
<?
//获取网页内容
Function fetch_urlpage_contents($url){
for($i=0;$i<10;$i++)
{
$c=@file_get_contents($url);
if(trim($c) != "")break;
}
// print($c);
return $c;
}//获取匹配内容
Function fetch_match_contents($begin,$end,$c)
{
$beginPos = strpos($c,$begin);
$endPos = strpos($c,$end);
if($beginPos > 0 && $endPos > 0 && $endPos > $beginPos)
{
$result = substr($c,$beginPos+strlen($begin),$endPos - $beginPos-strlen($begin));
return $result;
}
else
{
return "";
}
} //采集网页
Function pick($url,$ft,$th)
{
$c=fetch_urlpage_contents($url);
foreach($ft as $key => $value)
    {
$rs[$key]=fetch_match_contents($value["begin"],$value["end"],$c);
if(is_array($th[$key]))
  { foreach($th[$key] as $old => $new)
  {
  $rs[$key]=str_replace($old,$new,$rs[$key]);
  }
  }
    }
return $rs;
}
?><html>
<title>caiji</title>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
</head>
<body>
<?$url="http://www.01job.cn/asp/itjob.asp";
$ft["title"]["begin"]="<title>";
$ft["title"]["end"]="</title>";
$rs=pick($url,$ft,$th);
print_r($rs);
?>
</body>
</html>
写个例子。。$content = file_get_contents('http://www.01job.cn/asp/itjob.asp');
preg_match_all('/<a .*? class="Pos">(.*)<\/a>/',$content,$arr);
print_r($arr);
$con .= file_get_contents("http://newhouse.hfhouse.com/HouseList/index/areaId/4/?&p=$i");
$preg = '#<td width="130" rowspan="5" align="center" valign="middle" id="(.*)"><a href="(.*)" target="_blank"><img src="(.*)" alt="(.*)" width="124" height="98" hspace="3" vspace="3" border="0" /></a></td>#iUs';
preg_match_all($preg , $con , $arr);
print_r($arr[4]);
这种方法只能采集一页，我要采集多个列表页，如何修改代码呢？