php模拟登陆程序
最近想采集我们图书馆的一些信息,但是有些页面需要先登录才能提取,参照网上的
资料做了一个模拟登陆采集程序(代码如下),send_cookie函数 是用来提交登陆信息获取
cookie的,receive_cookie函数是用来读取登陆以后的需读取的网页的.send_cookie函数
读取模拟生成cookie都是成功的,但是receive_cookie却怎么都读不到东西.放在这里分享
一下这个采集程序,也请大家帮我看看问题究竟在哪?
<?php
$postData = "barcode=13000107071359&fangshi=0&password=&x=24&y=9";
$sent_posturl = "http://59.64.144.2/reader/login.jsp?str_kind=login";
$receive_posturl = "http://59.64.144.2/reader/infoList.jsp";
$sent_path = "/reader/login.jsp?str_kind=login";
$receive_path = "/reader/infoList.jsp";
$sessid=send_cookie ($postData,$sent_posturl,$sent_path);
receive_cookie($receive_posturl,$receive_path,$sessid);
function send_cookie ($postData,$posturl,$path)
{
$postUrl = parse_url($posturl);
$host = $postUrl[host] ? $postUrl[host] : "";
$port = $postUrl[port] ? $postUrl[port] : 80;
$fsp = fsockopen($host, $port, &$errno, &$errstr, 30);
if(!$fsp){
print "\nopen socket failed\n";
}else{
fputs($fsp, "POST ".$path." HTTP/1.1\r\n");
fputs($fsp, "Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*\r\n");
fputs($fsp, "Accept-Encoding: gzip, deflate\r\n");
fputs($fsp, "Accept-Language: zh-cn\r\n");
//fputs($fsp, "Referer: http://59.64.144.2/reader/login.jsp?str_kind=login\r\n");
fputs($fsp, "Content-Type: application/x-www-form-urlencoded\r\n");
fputs($fsp, "Connection: Keep-Alive\r\n");
fputs($fsp, "Host:".$host."\r\n");
fputs($fsp, "User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)\r\n");
fputs($fsp, "Content-Length: ".strlen($postData)."\r\n\r\n");
fputs($fsp, $postData);
$resp = "";
do{
if(strlen($out=fread($fsp, 1024)) == 0) break;
$resp .= $out;
if ( preg_match('/JSESSIONID=([0-9a-z]+);/i', $out, $matches))
{
$sessid = $matches[1];
//echo("sessid=$sessid");
}
}while(true);
//$resp=str_replace('window.location="','window.location="http://59.64.144.2/reader/',$resp);//测试用代码,用来证明sent_cookie运作正常
//echo "$resp";
//echo "<br><br>".nl2br($resp);
return $sessid;
fclose($fsp);
}
}
function receive_cookie($posturl,$path,$sessid)
{
$postUrl = parse_url($posturl);
$host = $postUrl[host] ? $postUrl[host] : "";
$port = $postUrl[port] ? $postUrl[port] : 80; $fp = fsockopen($host, $port, &$errno, &$errstr, 30);
if(!$fp){
print "\nopen socket failed\n";
}else{
fputs($fp, "GET ".$path." HTTP/1.1\r\n");
fputs($fp, "Cookie: JSESSIONID=".$sessid."\r\n");
fputs($fp, "Referer: http://59.64.144.2/reader/login.jsp?str_kind=login\r\n");
fputs($fp, "Host:".$host."\r\n");
fputs($fp, "User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)\r\n");
$response = "";
do{
if(strlen($out=fgets($fp, 1024)) == 0) break;
$response .= $out;
//echo($out);
}while(true);
echo($response);
/*$hlen = strpos($response,"\r\n\r\n"); // LINUX下是 "\n\n"
echo "hlen=$hlen";
$header = substr($response, 0, $hlen);
echo "header=$header<hr><hr>";*/
fclose($fp);
}
}
?>
最近想采集我们图书馆的一些信息,但是有些页面需要先登录才能提取,参照网上的
资料做了一个模拟登陆采集程序(代码如下),send_cookie函数 是用来提交登陆信息获取
cookie的,receive_cookie函数是用来读取登陆以后的需读取的网页的.send_cookie函数
读取模拟生成cookie都是成功的,但是receive_cookie却怎么都读不到东西.放在这里分享
一下这个采集程序,也请大家帮我看看问题究竟在哪?
<?php
$postData = "barcode=13000107071359&fangshi=0&password=&x=24&y=9";
$sent_posturl = "http://59.64.144.2/reader/login.jsp?str_kind=login";
$receive_posturl = "http://59.64.144.2/reader/infoList.jsp";
$sent_path = "/reader/login.jsp?str_kind=login";
$receive_path = "/reader/infoList.jsp";
$sessid=send_cookie ($postData,$sent_posturl,$sent_path);
receive_cookie($receive_posturl,$receive_path,$sessid);
function send_cookie ($postData,$posturl,$path)
{
$postUrl = parse_url($posturl);
$host = $postUrl[host] ? $postUrl[host] : "";
$port = $postUrl[port] ? $postUrl[port] : 80;
$fsp = fsockopen($host, $port, &$errno, &$errstr, 30);
if(!$fsp){
print "\nopen socket failed\n";
}else{
fputs($fsp, "POST ".$path." HTTP/1.1\r\n");
fputs($fsp, "Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*\r\n");
fputs($fsp, "Accept-Encoding: gzip, deflate\r\n");
fputs($fsp, "Accept-Language: zh-cn\r\n");
//fputs($fsp, "Referer: http://59.64.144.2/reader/login.jsp?str_kind=login\r\n");
fputs($fsp, "Content-Type: application/x-www-form-urlencoded\r\n");
fputs($fsp, "Connection: Keep-Alive\r\n");
fputs($fsp, "Host:".$host."\r\n");
fputs($fsp, "User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)\r\n");
fputs($fsp, "Content-Length: ".strlen($postData)."\r\n\r\n");
fputs($fsp, $postData);
$resp = "";
do{
if(strlen($out=fread($fsp, 1024)) == 0) break;
$resp .= $out;
if ( preg_match('/JSESSIONID=([0-9a-z]+);/i', $out, $matches))
{
$sessid = $matches[1];
//echo("sessid=$sessid");
}
}while(true);
//$resp=str_replace('window.location="','window.location="http://59.64.144.2/reader/',$resp);//测试用代码,用来证明sent_cookie运作正常
//echo "$resp";
//echo "<br><br>".nl2br($resp);
return $sessid;
fclose($fsp);
}
}
function receive_cookie($posturl,$path,$sessid)
{
$postUrl = parse_url($posturl);
$host = $postUrl[host] ? $postUrl[host] : "";
$port = $postUrl[port] ? $postUrl[port] : 80; $fp = fsockopen($host, $port, &$errno, &$errstr, 30);
if(!$fp){
print "\nopen socket failed\n";
}else{
fputs($fp, "GET ".$path." HTTP/1.1\r\n");
fputs($fp, "Cookie: JSESSIONID=".$sessid."\r\n");
fputs($fp, "Referer: http://59.64.144.2/reader/login.jsp?str_kind=login\r\n");
fputs($fp, "Host:".$host."\r\n");
fputs($fp, "User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)\r\n");
$response = "";
do{
if(strlen($out=fgets($fp, 1024)) == 0) break;
$response .= $out;
//echo($out);
}while(true);
echo($response);
/*$hlen = strpos($response,"\r\n\r\n"); // LINUX下是 "\n\n"
echo "hlen=$hlen";
$header = substr($response, 0, $hlen);
echo "header=$header<hr><hr>";*/
fclose($fp);
}
}
?>
至于你在已经获取sessionid的情况下为什么还不能获取页面信息,可能是你请求的参数不合法吧.在用curl函数实现这一过程的时候,需要指定cookie路径和cookie文件所在的位置.