最近开发一个项目,需要做一个爬虫,遇到了一下问题,访问google博客需要先登录https://www.blogger.com(注意这里是https协议),输入邮件和密码后进入博客首页http://www.blogger.com/home?pli=1(这里改成http协议了)。我用httpclient4模拟https登录过程,
开始创建https://www.google.com host对象
能成功返回以下结果:HTTP/1.1 200 OK
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"><html dir="ltr"><head><title>转接</title><meta http-equiv="refresh" content="0; url=http://www.blogger.com/home?pli=1"></head><body class="null lang_zh"><script type="text/javascript">    var expiry = new Date();    expiry.setTime(expiry.getTime() + 1000);    document.cookie="testCookie=true; expires=" + expiry.toGMTString();    if (document.cookie.indexOf("testCookie=") == -1) {      location.replace("/app/nocookies.html");    } else {      location.replace("http://www.blogger.com/home?pli\x3d1");    }  </script><script src="https://ssl.google-analytics.com/urchin.js" type="text/javascript">        </script><script type="text/javascript">          _uacct="UA-18003-7";          _uanchor=1;          _ufsc=false;          _usample = 10;          urchinTracker();          _uff=0;        </script></body></html>
以上是一个转接页面,目标是http://www.blogger.com/home?pli=1,这里改成http协议了,以后把host改成http://www.blogger.com对象,再用httpClient.execute(host,httppost);
返回以下错误:
Exception in thread "main" java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:130)
at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:127)
at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:233)
at org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:82)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:210)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:271)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:234)
at org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:259)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:292)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:126)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:410)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:555)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:509)
at WebSite.executeMethod(WebSite.java:78)
at WebSite.get(WebSite.java:73)
at TestGoogle.newBlog(TestGoogle.java:24)
at TestGoogle.main(TestGoogle.java:14)这可能是单点登陆问题,请问该如何解决?