想对星座测试进行采集 - 调试易

想对星座测试进行采集

http://roll.astro.sina.com.cn/t/aqcs/index.shtml 这是新浪的测试题目，我想采集这些问题，答案，哪位高手给我个采集这种的方法和思路

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

//一共有34页
if(isset($_GET['i']) && !empty($_GET['i'])){
  $i = trim($_GET['i']);
  $i < 34 ? $i++ : $i ;
  $url = "http://roll.astro.sina.com.cn/t/aqcs/index_$i.shtml" ;
}else{
  $i = 1 ;
  $url = "http://roll.astro.sina.com.cn/t/aqcs/index_$i.shtml" ;
}
echo "正在采集第$i页<br />\n";
$content = file_get_contents($url);
if(preg_match_all('/<li><a\s+href="(.*?)"[^>]*>(.*?)<\/a>/is',$content,$arr)){
// print_r($arr[1]) ;//地址
// print_r($arr[2]) ;//标题
  foreach($arr[1] as $k=>$v){
    //$arr[1][$k] 是地址，$arr[2][$k] 是标题
    //做入库处理
  }
}
if($i == 34){
  echo "<script>alert('采集完成');window.location.href=\"xxxx.php\"</script>";
  exit;
}else{
  echo "<script>window.location.href=\"this.php?i=$i\"</script>";
}