我想了想. 觉得用套接字,根据http协议获取网页内容是可行的. 但是有对这一程序来说就有很大的局限. 因为: The most common form of Request-URI is that used to identify a resource on an origin server or gateway. In this case the absolute path of the URI MUST be transmitted (see section 3.2.1, abs_path) as the Request-URI, and the network location of the URI (net_loc) MUST be transmitted in a Host header field. For example, a client wishing to retrieve the resource above directly from the origin server would create a TCP connection to port 80 of the host "www.w3.org" and send the lines: GET /pub/WWW/TheProject.html HTTP/1.1 Host: www.w3.org followed by the remainder of the Request. Note that the absolute path cannot be empty; if none is present in the original URI, it MUST be given as "/" (the server root).所以你的socketoutput.writeBytes("GET / /HTTP/1.1 \r\n"); 对于sohu是可行的,但是对于其它网站就不一定了. 所以我建议用另一种方法.import java.io.*; import java.net.*; import java.util.*;public class GetPage{ public static void main(String args[]){ String host="http://www.sina.com.cn/"; try{ URL address=new URL(host); URLConnection connection=address.openConnection(); connection.setUseCaches(true); connection.setDoOutput(true);
我是win98 + freejava3.0 (SUN JDK1.3)
能把你的问题说清楚一点吗?
觉得用套接字,根据http协议获取网页内容是可行的.
但是有对这一程序来说就有很大的局限.
因为:
The most common form of Request-URI is that used to identify a
resource on an origin server or gateway. In this case the absolute
path of the URI MUST be transmitted (see section 3.2.1, abs_path) as
the Request-URI, and the network location of the URI (net_loc) MUST
be transmitted in a Host header field. For example, a client wishing
to retrieve the resource above directly from the origin server would
create a TCP connection to port 80 of the host "www.w3.org" and send
the lines: GET /pub/WWW/TheProject.html HTTP/1.1
Host: www.w3.org followed by the remainder of the Request. Note that the absolute path
cannot be empty; if none is present in the original URI, it MUST be
given as "/" (the server root).所以你的socketoutput.writeBytes("GET / /HTTP/1.1 \r\n");
对于sohu是可行的,但是对于其它网站就不一定了.
所以我建议用另一种方法.import java.io.*;
import java.net.*;
import java.util.*;public class GetPage{
public static void main(String args[]){
String host="http://www.sina.com.cn/";
try{
URL address=new URL(host);
URLConnection connection=address.openConnection();
connection.setUseCaches(true);
connection.setDoOutput(true);
BufferedReader in =new BufferedReader(new InputStreamReader(connection.getInputStream()));
while(in.readLine()!=null){
System.out.println(in.readLine());
}
}
catch(Exception e){
System.out.println("Error");
}
}
}
Properties props = System.getProperties();
props.put("http.proxyHost", "proxyhostname");
props.put("http.proxyPort", "proxyhostport");