PHP采集企业信息网站的具体方法

想搞个小程序，采集某企业信息网站的内容，经处理后在我的网站上显示，不想逐条信息输入。
我是想将DIV ID截出内容再逐条存入数据库，但同一页面有很多DIV ID是相同值的，我不知道怎样截了。
哪位能讲一个好的办法吗？
例如
<div id='coname'>公司1</div>......
<div id='coname'>公司2</div>......
<div id='coname'>公司3</div>......
<div id='coname'>公司4</div>......
ID相同，我不知道怎样截取

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

例如下面的程序代码，如果我这个页面有N条<h2>....</h2>的记录都想取出来，我该怎样做呢？里面的N条是无定清楚的。<?php
function str_substr($str ,$start, $end) { //字符串截取函数
  $x = strpos($str, $start);
   return substr($str,  $x+strlen($start), strpos($str, $end)-$x+strlen($end));
}$url="http://www.smartweb.cn";//[/url]给据采集内容自己定
@$str=file_get_contents($url);
///file_get_contents -- 将整个文件读入一个字符串
$start='<h2>';//截取内容前的html   最后网页中唯一
$end='</h2>';//截取内容后的html 最后网页中唯一
$content=str_substr($str , $start, $end);
echo $content; //测试采集到的内容
?>
preg_match_all('/<h2>(.*)<\/h2>/Usi',$str,$matches);
print_r($matches);
lz要学着写正则了，采集离不开正则的
能否完整点，用了几年PHP，确实没用过正则。
<?php
$url="http://www.smartweb.cn";
$str=file_get_contents($url);
preg_match_all('/<h2>(.*)<\/h2>/Usi',$str,$matches);
print_r($matches);$matches就是匹配出来的结果，处理这个数组就行了
准备搞个交流采集的站，还没有上线，先放出来http://www.caijike.com
准备做个交流采集的站，还没有上线，先说出来，http://www.caijike.com
OK,
3Q$url="http://www.smartweb.cn/";//[/url]给据采集内容自己定
$str=file_get_contents($url);
preg_match_all('/<h2>(.*)<\/h2>/Usi',$str,$matches);
print_r($matches);
http://www.smartweb.cn