javascript: hidden frame,abstract data from frame's html .
java: through URLConnection grab other page's html as String,and parse it.
NO Chinese ShuRuFa TuYan!!!
java: through URLConnection grab other page's html as String,and parse it.
NO Chinese ShuRuFa TuYan!!!
http://www.8080.net/hangqing/sp2.asp?cateid=2&proid=1
我想通过程序将表格中的数据读出来,
能给个可以实现的代码吗?
<%@ page contentType="text/html;charset=gb2312"%>
<%
String sCurrentLine;
String sTotalString;
sCurrentLine="";
sTotalString="";
java.io.InputStream l_urlStream;
java.net.URL l_url = new java.net.URL("http://www.163.net/");
java.net.HttpURLConnection l_connection = (java.net.HttpURLConnection) _url.openConnection();
l_connection.connect();
l_urlStream = l_connection.getInputStream();
java.io.BufferedReader l_reader = new java.io.BufferedReader(new java.io.InputStreamReader(l_urlStream));
while ((sCurrentLine = l_reader.readLine()) != null)
{
sTotalString+=sCurrentLine;
}
out.println(sTotalString);
%>
后记
虽然代码比较简单,但是,我认为根据这个,可以实现“网络爬虫”的功能,比如从页面找href连接,然后再得到那个连接,然后再“抓”,不停止地(当然可以限定层数),这样,可以实现“网页搜索”功能你要做的就是分析那个字符串了
http://www.8080.net/hangqing/sp2.asp?cateid=2&proid=1
我想通过程序将表格中的数据读出来,现在网页已经抓到了
怎么分析这个字符串呢? 能给个具体一点的可以实现的代码吗?
<%@page import="java.io.*,java.net.*"%><%=freshTable()%>
<%!
private String freshTable()
{
StringBuffer html=new StringBuffer();
String url="http://www.8080.net/hangqing/sp2.asp?cateid=2&proid=1";
try{
InputStream source=new URL(url).openStream();
BufferedInputStream bis=new BufferedInputStream(source);
int ch;
while((ch=bis.read())>-1)
{
html.append((char)ch);
}
String htmlstr=new String(html);
//String htmlGBK=transCode(htmlstr,"GBK");
int start=htmlstr.indexOf("</table>");
int end=htmlstr.indexOf("<script");
htmlstr=htmlstr.substring(start,end);
source.close();
bis.close();
return htmlstr;
}catch(Exception e){
return null;
}
}
protected String transCode(String value,String enc)
{
try{
if(value==null)
{
return null;
}
value=value.trim();
value=new String(value.getBytes("ISO8859_1"),enc);
return value;
}catch(Exception e){
return null;
}}%>
分析网页我是此用的正则表达式.
有没有代码让我学一学?