大家好,我现在想基于代理服务器做一个网页爬虫程序,使用的是apache开源下的import org.apache.http.client.HttpClient;但现在遇到一个问题,希望各位大虾帮助,多少分都可以,只要能帮我解决这个问题 ~~我参考的是一下网址的例子,通过代理访问网页。
http://svn.apache.org/repos/asf/httpcomponents/httpclient/trunk/httpclient/src/examples/org/apache/http/examples/client/ClientExecuteProxy.java但遇到的问题是我随便写代理服务器的地址,也仍然能抓取到网页的内容,本机是不需要代理上网的。是我理解有问题还是参数设置有问题呢 大家帮忙,谢谢啦
--------------------------------------------------------------------------------------------------------------------
// make sure to use a proxy that supports CONNECT
HttpHost target = new HttpHost("issues.apache.org", 443, "https");
HttpHost proxy = new HttpHost("127.0.0.1", 8080, "http");//我就是更改的该类的第一个参数,应该是这样的吧,如果是更改代理服务器的设置的话。 //下面的部分就是基本的设置然后读取网页内容了,我理解应该没有问题吧 嘿嘿
// general setup
SchemeRegistry supportedSchemes = new SchemeRegistry(); // Register the "http" and "https" protocol schemes, they are
// required by the default operator to look up socket factories.
supportedSchemes.register(new Scheme("http",
PlainSocketFactory.getSocketFactory(), 80));
supportedSchemes.register(new Scheme("https",
SSLSocketFactory.getSocketFactory(), 443)); // prepare parameters
HttpParams params = new BasicHttpParams();
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
HttpProtocolParams.setContentCharset(params, "UTF-8");
HttpProtocolParams.setUseExpectContinue(params, true); ClientConnectionManager ccm = new ThreadSafeClientConnManager(params,
supportedSchemes); DefaultHttpClient httpclient = new DefaultHttpClient(ccm, params); httpclient.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY, proxy); HttpGet req = new HttpGet("/"); System.out.println("executing request to " + target + " via " + proxy);
HttpResponse rsp = httpclient.execute(target, req);
HttpEntity entity = rsp.getEntity();
-----------------------------------------------------------------------------------------------------------------------------------
大家帮帮忙啊 送多少分都可以~~
http://svn.apache.org/repos/asf/httpcomponents/httpclient/trunk/httpclient/src/examples/org/apache/http/examples/client/ClientExecuteProxy.java但遇到的问题是我随便写代理服务器的地址,也仍然能抓取到网页的内容,本机是不需要代理上网的。是我理解有问题还是参数设置有问题呢 大家帮忙,谢谢啦
--------------------------------------------------------------------------------------------------------------------
// make sure to use a proxy that supports CONNECT
HttpHost target = new HttpHost("issues.apache.org", 443, "https");
HttpHost proxy = new HttpHost("127.0.0.1", 8080, "http");//我就是更改的该类的第一个参数,应该是这样的吧,如果是更改代理服务器的设置的话。 //下面的部分就是基本的设置然后读取网页内容了,我理解应该没有问题吧 嘿嘿
// general setup
SchemeRegistry supportedSchemes = new SchemeRegistry(); // Register the "http" and "https" protocol schemes, they are
// required by the default operator to look up socket factories.
supportedSchemes.register(new Scheme("http",
PlainSocketFactory.getSocketFactory(), 80));
supportedSchemes.register(new Scheme("https",
SSLSocketFactory.getSocketFactory(), 443)); // prepare parameters
HttpParams params = new BasicHttpParams();
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
HttpProtocolParams.setContentCharset(params, "UTF-8");
HttpProtocolParams.setUseExpectContinue(params, true); ClientConnectionManager ccm = new ThreadSafeClientConnManager(params,
supportedSchemes); DefaultHttpClient httpclient = new DefaultHttpClient(ccm, params); httpclient.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY, proxy); HttpGet req = new HttpGet("/"); System.out.println("executing request to " + target + " via " + proxy);
HttpResponse rsp = httpclient.execute(target, req);
HttpEntity entity = rsp.getEntity();
-----------------------------------------------------------------------------------------------------------------------------------
大家帮帮忙啊 送多少分都可以~~
解决方案 »
免费领取超大流量手机卡,每月29元包185G流量+100分钟通话, 中国电信官方发货