这是我要抓取数据的网页
http://www.cma-cgm.com/en/eBusiness/Schedules/VoyageFinder/VoyageFinder.aspx?VoyageReference=RE513W&SearchMode=SeachThisVoyage在页面的右边有一个“Export PDF”链接,通过这个链接可以下载一个pdf文件,这个部分的源代码(被解析成html文件后)是这样的:
<a id="ctl00_ContentPlaceBody_Toolbar_Export"
class="CmdBarButton" onMouseOver="MM_swapImage('Export','','/en/App_Themes/Default/Images/Common/CommandBar/Export_on.gif',1)" onMouseOut="MM_swapImgRestore()" href="javascript:__doPostBack('ctl00$ContentPlaceBody$Toolbar$Export','')">
下面是__doPostBack的源代码
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}theForm定义如下
var theForm = document.forms['aspnetForm'];
if (!theForm) {
theForm = document.aspnetForm;
}然后是aspnetForm的定义
<form name="aspnetForm" method="post" action="VoyageFinder.aspx?VoyageReference=RE513W&SearchMode=SeachThisVoyage" id="aspnetForm">
我的代码部分是这样写得 String url = "http://www.cma-cgm.com/en/eBusiness/Schedules/VoyageFinder/VoyageFinder.aspx?VoyageReference=RE513W&SearchMode=SeachThisVoyage";
PostMethod postMethod = new PostMethod(url);
HttpClient httpClient = new HttpClient();
NameValuePair[] data = {
new NameValuePair("__EVENTTARGET", "ctl00$ContentPlaceBody$Toolbar$Export"),
new NameValuePair("__EVENTARGUMENT", ""),
};postMethod.setRequestBody(data);
int statusCode = httpClient.executeMethod(postMethod);
实际上发现它跳转到/en/eBusiness/Schedules/VoyageFinder/Default.aspx页面上面去了,而不是返回pdf文件,那么原因是什么?
http://www.cma-cgm.com/en/eBusiness/Schedules/VoyageFinder/VoyageFinder.aspx?VoyageReference=RE513W&SearchMode=SeachThisVoyage在页面的右边有一个“Export PDF”链接,通过这个链接可以下载一个pdf文件,这个部分的源代码(被解析成html文件后)是这样的:
<a id="ctl00_ContentPlaceBody_Toolbar_Export"
class="CmdBarButton" onMouseOver="MM_swapImage('Export','','/en/App_Themes/Default/Images/Common/CommandBar/Export_on.gif',1)" onMouseOut="MM_swapImgRestore()" href="javascript:__doPostBack('ctl00$ContentPlaceBody$Toolbar$Export','')">
下面是__doPostBack的源代码
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}theForm定义如下
var theForm = document.forms['aspnetForm'];
if (!theForm) {
theForm = document.aspnetForm;
}然后是aspnetForm的定义
<form name="aspnetForm" method="post" action="VoyageFinder.aspx?VoyageReference=RE513W&SearchMode=SeachThisVoyage" id="aspnetForm">
我的代码部分是这样写得 String url = "http://www.cma-cgm.com/en/eBusiness/Schedules/VoyageFinder/VoyageFinder.aspx?VoyageReference=RE513W&SearchMode=SeachThisVoyage";
PostMethod postMethod = new PostMethod(url);
HttpClient httpClient = new HttpClient();
NameValuePair[] data = {
new NameValuePair("__EVENTTARGET", "ctl00$ContentPlaceBody$Toolbar$Export"),
new NameValuePair("__EVENTARGUMENT", ""),
};postMethod.setRequestBody(data);
int statusCode = httpClient.executeMethod(postMethod);
实际上发现它跳转到/en/eBusiness/Schedules/VoyageFinder/Default.aspx页面上面去了,而不是返回pdf文件,那么原因是什么?
if (status == HttpStatus.SC_MOVED_PERMANENTLY || status == HttpStatus.SC_MOVED_TEMPORARILY) {
// 从头中取出转向的地址
Header locationHeader = post.getResponseHeader("location");
String location = null;
if (locationHeader != null) {
location = locationHeader.getValue();
System.out.println("The page was redirected to:" + location);
} else {
System.err.println("Location field value is null.");
}
}