我进行如下的代码测试:
String urlstr="http://www.51job.com" ;
URL url=new URL(urlstr);
URLConnection con=url.openConnection();
BufferedReader reader= new BufferedReader(new InputStreamReader(con.getInputStream()));
String line=reader.readLine();
while(line!=null)
{
System.out.println(line);
line=reader.readLine();
}
发现它对百度是可以的但是对http://www.51job.com或者是它下面的子网页却不行,
或者说对51job进行爬虫也是得不到东西啊,这是为什么呢?
难道说它设置了防爬虫功能?
String urlstr="http://www.51job.com" ;
URL url=new URL(urlstr);
URLConnection con=url.openConnection();
BufferedReader reader= new BufferedReader(new InputStreamReader(con.getInputStream()));
String line=reader.readLine();
while(line!=null)
{
System.out.println(line);
line=reader.readLine();
}
发现它对百度是可以的但是对http://www.51job.com或者是它下面的子网页却不行,
或者说对51job进行爬虫也是得不到东西啊,这是为什么呢?
难道说它设置了防爬虫功能?
<td colspan='2' width='317' height='60' align='left'><a href='http://ac.51job.com/phpAD/adtrace.php?ID=11612470' title='上海巨人网络科技有限公司' border='0' target='_blank'><img src='http://img01.51jobcdn.com/im/images/2010/sh/ztgamec0701_8182.gif' border='0' height='60' width='317'></a></td>
import java.net.*;
import java.io.*;public class urlconnection
{
public static void main(String[] args)
{
StringBuffer document = new StringBuffer();
try
{
URL url = new URL("http://www.51job.com");
URLConnection conn = url.openConnection();
BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line = null;
while ((line = reader.readLine()) != null)
document.append(line + " ");
reader.close();
}
catch (MalformedURLException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}
System.out.println(document.toString());
}
}