想要抓取的网址是这个
http://data-m.gtzy123.com/search_gs.aspx?c=JL8HA246220&w=2.65&mod=1
但是却抓取不到跳转后的html,请大神帮忙==================
<?php
$htmlstr = get_html("http://data-m.gtzy123.com/search_gs.aspx?c=JL8HA246220&w=2.65&mod=1");
echo $htmlstr;
function get_html( $url )
{
$ch = curl_init(); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_TIMEOUT,120);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202 Safari/535.1"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);//302redirect
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);//这个加了也没用,抓取不到最后的页面内容
$content = curl_exec($ch); curl_close($ch);
return $content;
}===========
http://data-m.gtzy123.com/search_gs.aspx?c=JL8HA246220&w=2.65&mod=1
但是却抓取不到跳转后的html,请大神帮忙==================
<?php
$htmlstr = get_html("http://data-m.gtzy123.com/search_gs.aspx?c=JL8HA246220&w=2.65&mod=1");
echo $htmlstr;
function get_html( $url )
{
$ch = curl_init(); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_TIMEOUT,120);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202 Safari/535.1"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);//302redirect
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);//这个加了也没用,抓取不到最后的页面内容
$content = curl_exec($ch); curl_close($ch);
return $content;
}===========
print_r(get_headers($url));
你会看到这样的结果
Array
(
[0] => HTTP/1.1 302 Found
[1] => Cache-Control: private
[2] => Content-Type: text/html; charset=utf-8
[3] => Location: /default.aspx?c=JL8HA246220&w=2.65&mod=1
[4] => Server: Microsoft-IIS/8.5
[5] => X-AspNet-Version: 4.0.30319
[6] => X-Powered-By: ASP.NET
[7] => Date: Mon, 11 Jun 2018 22:44:38 GMT
[8] => Connection: close
[9] => Content-Length: 165
[10] => HTTP/1.1 302 Found
[11] => Cache-Control: private
[12] => Content-Type: text/html; charset=utf-8
[13] => Location: /result.aspx
[14] => Server: Microsoft-IIS/8.5
[15] => Set-Cookie: ASP.NET_SessionId=ajf3klnysxhfnk4pqyop0x4c; path=/; HttpOnly
[16] => X-AspNet-Version: 4.0.30319
[17] => X-Powered-By: ASP.NET
[18] => Date: Mon, 11 Jun 2018 22:44:38 GMT
[19] => Connection: close
[20] => Content-Length: 129
[21] => HTTP/1.1 302 Found
[22] => Cache-Control: private
[23] => Content-Type: text/html; charset=utf-8
[24] => Location: /Default.aspx
[25] => Server: Microsoft-IIS/8.5
[26] => X-AspNet-Version: 4.0.30319
[27] => X-Powered-By: ASP.NET
[28] => Date: Mon, 11 Jun 2018 22:44:39 GMT
[29] => Connection: close
[30] => Content-Length: 130
[31] => HTTP/1.1 200 OK
[32] => Cache-Control: private
[33] => Content-Type: text/html; charset=utf-8
[34] => Server: Microsoft-IIS/8.5
[35] => Set-Cookie: ASP.NET_SessionId=hrobjv15rhmjubnsleiaw0e2; path=/; HttpOnly
[36] => X-AspNet-Version: 4.0.30319
[37] => X-Powered-By: ASP.NET
[38] => Date: Mon, 11 Jun 2018 22:44:39 GMT
[39] => Connection: close
[40] => Content-Length: 2630
)在 302 页面中有 cookie 设置,如果你不接收和发送 cookie 的话,是到达不了目标页的