目标网址:http://www.scirp.org/Journal/Home.aspx?JournalID=532最近写了一个爬虫,用jsoup解析元素,但遇到点问题,该网站中每个年数据是用asp.net中_doPostBack表单提交数据,该表单隐藏提交项已经知道,但模拟表单提交不能获取到不同年份的数据只能获取最近一年的数据。很费解。求大神帮看下感激不尽!下面是代码示例:public static void main(String[] args) {
String url ="http://www.scirp.org/Journal/Home.aspx?JournalID=532";
String user_Agent="Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.87 Safari/537.36";
//获取登录框的隐含参数 type="hidden"
Connection connection=Jsoup.connect(url).timeout(60000);
connection.header("User-Agent", user_Agent);//配置模拟浏览器
Response response=null;
try {
response = connection.method(Method.GET).execute();// 获取响应
System.out.println(response.cookies());
} catch (ClientProtocolException e1) {
e1.printStackTrace();
} catch (IOException e1) {
e1.printStackTrace();
}
// 应用JsoupHtml解析包解析html包含参数
Document doc = Jsoup.parse(response.body());// 转换为Dom树
System.out.println(doc);
// map存放post时的数据
Map<String, String> datas = new HashMap<>();
// form 表单input项
Elements inputElemets = doc.select("form[method=post]").first()
.select("input[name]");
for (Iterator it = inputElemets.iterator(); it.hasNext();) {
Element inputElement = (Element) it.next();
if("__EVENTTARGET".equals(inputElement.attr("name"))){
datas.put("__EVENTTARGET", "ctl00$JournalSimplify1$Repater_Yearslist$ctl04$HyperLink_YearlistLink");
}else{
datas.put(inputElement.attr("name"), inputElement.attr("value"));
}
}
/*String searchType="Journal";
String commonToolkitScripts="1";
String __ASYNCPOST="true";
datas.put("ctl00$UserControl_HeaderNoLogin1$UserControl_search$DropDownList_SearchType", searchType);
datas.put("hiddenInputToUpdateATBuffer_CommonToolkitScripts", commonToolkitScripts);
datas.put("__ASYNCPOST",__ASYNCPOST);
String scriptManager1="ctl00$JournalSimplify1$UpdatePanel1|ctl00$JournalSimplify1$Repater_Yearslist$ctl04$HyperLink_YearlistLink";
datas.put("ctl00$ScriptManager1", scriptManager1);*/
System.out.println(datas);
System.out.println(datas.size());
Connection connection2 = Jsoup.connect(url).timeout(60000);
try {
// 设置cookies和post数据
Response rs = connection2.ignoreContentType(true)
.method(Method.POST).data(datas)
.cookies(response.cookies()).execute();
Document document = Jsoup.parse(rs.body());
System.out.println(document);
Elements volnoElements = document.select("span.volno > a");
System.out.println(volnoElements);
} catch (IOException e1) {
e1.printStackTrace();
}
}
String url ="http://www.scirp.org/Journal/Home.aspx?JournalID=532";
String user_Agent="Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.87 Safari/537.36";
//获取登录框的隐含参数 type="hidden"
Connection connection=Jsoup.connect(url).timeout(60000);
connection.header("User-Agent", user_Agent);//配置模拟浏览器
Response response=null;
try {
response = connection.method(Method.GET).execute();// 获取响应
System.out.println(response.cookies());
} catch (ClientProtocolException e1) {
e1.printStackTrace();
} catch (IOException e1) {
e1.printStackTrace();
}
// 应用JsoupHtml解析包解析html包含参数
Document doc = Jsoup.parse(response.body());// 转换为Dom树
System.out.println(doc);
// map存放post时的数据
Map<String, String> datas = new HashMap<>();
// form 表单input项
Elements inputElemets = doc.select("form[method=post]").first()
.select("input[name]");
for (Iterator it = inputElemets.iterator(); it.hasNext();) {
Element inputElement = (Element) it.next();
if("__EVENTTARGET".equals(inputElement.attr("name"))){
datas.put("__EVENTTARGET", "ctl00$JournalSimplify1$Repater_Yearslist$ctl04$HyperLink_YearlistLink");
}else{
datas.put(inputElement.attr("name"), inputElement.attr("value"));
}
}
/*String searchType="Journal";
String commonToolkitScripts="1";
String __ASYNCPOST="true";
datas.put("ctl00$UserControl_HeaderNoLogin1$UserControl_search$DropDownList_SearchType", searchType);
datas.put("hiddenInputToUpdateATBuffer_CommonToolkitScripts", commonToolkitScripts);
datas.put("__ASYNCPOST",__ASYNCPOST);
String scriptManager1="ctl00$JournalSimplify1$UpdatePanel1|ctl00$JournalSimplify1$Repater_Yearslist$ctl04$HyperLink_YearlistLink";
datas.put("ctl00$ScriptManager1", scriptManager1);*/
System.out.println(datas);
System.out.println(datas.size());
Connection connection2 = Jsoup.connect(url).timeout(60000);
try {
// 设置cookies和post数据
Response rs = connection2.ignoreContentType(true)
.method(Method.POST).data(datas)
.cookies(response.cookies()).execute();
Document document = Jsoup.parse(rs.body());
System.out.println(document);
Elements volnoElements = document.select("span.volno > a");
System.out.println(volnoElements);
} catch (IOException e1) {
e1.printStackTrace();
}
}
解决方案 »
- 我的S2SH架构的程序为何传到服务器后urlrewrite就不起作用了呢?
- ajax
- 菜鸟高分问一个简单的问题,望高手指点!在线等!
- 礼拜天求救啊,服务器缓存问题
- 真是急死我了,怎么都不出来,我的救星,你在哪里??
- 我的网站发布到服务器,昨天还行,正常运行, 今天就不行了,求高手解决额 ,万谢!
- javaWeb ajax轮询 服务器推送
- 用程序监视数据库,自动发出更新指令
- 怎样不允许两人用同一用户同时登陆??
- idea找不到getter和setter方法
- Unexpected character 'S' (code 83) in prolog;
- maven项目出现中数据库Communications link failure
打开控制台找到id为__VIEWSTATE的隐藏的input标签,完整的复制该标签的value值,把它粘贴到后台代码中对应的__VIEWSTATE的位置上,然后把其它参数配置好就ok