public static void main(String[] args) throws Exception{ List<String> list = new FetchLinks("http://sports.sina.com.cn/nba/").getUrls(); for(String str:list){ System.out.println(str); } } }
叫我大哥,谢谢! 编译级别需jdk1.5及以上
汗,大哥啊,你的头像 FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_LOAD(196) ERROR: transport library not found: dt_socket ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_LOAD(509) JDWP exit error AGENT_ERROR_TRANSPORT_LOAD(196): No transports initialized [../../../src/share/back/debugInit.c:690]我用的eclipse
String needStr = ""; //m:id="userId
String regex = "m:id=\"\\w*";
Pattern p = Pattern.complie(regex );
Matcher m = p.matcher(findStr );
while(m.find()){
needStr = m.group();
System.out.println(needStr );
} 自己改一改正则
在循环得到a标签下的href
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;public class FetchLinks {
private String pageSrc; public FetchLinks(String url) throws MalformedURLException,IOException{
pageSrc = getPageSrc(url);
} /**
* 根据strUrl获取网页源文件.
* @param strURL
* @return 源文件为空,返回空串
* @throws MalformedURLException
* @throws IOException
*/
private String getPageSrc(String strUrl) throws MalformedURLException,IOException {
StringBuffer sb = new StringBuffer();
java.net.URL url = new java.net.URL(strUrl);
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String line;
while ((line = in.readLine()) != null) {
sb.append(line);
}
in.close();
return sb.toString();
}
/**
* 获取网页中所有包含href属性的<a>标签.
* @return pageSrc(网页源码)为空,返回null
*/
private List<String> getAnchorContent(){
if(pageSrc == null) {
return null;
}
List<String> list = new ArrayList<String>();
String regex = "<[a|A][^>]*[h|H][r|R][e|E][f|F][^>]*>";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(pageSrc);
while(matcher.find()){
list.add(matcher.group());
}
return list;
}
private List<String> getHrefs() {
List<String> anchorList = getAnchorContent();
List<String> list = new ArrayList<String>();
if(anchorList == null){
return list;
}
Pattern pattern;
for(String anchor:anchorList) {
//<a href="www.baidu.com">
if(anchor.matches(".*[h|H][r|R][e|E][f|F]\\s*=\\s*\"[^\"]*\".*")) {
pattern = Pattern.compile("[h|H][r|R][e|E][f|F]\\s*=\\s*\"[^\"]+\"");
Matcher matcher = pattern.matcher(anchor);
while(matcher.find()){
list.add(matcher.group());
}
continue;
}
//<a href='www.baidu.com'>
if(anchor.matches(".*[h|H][r|R][e|E][f|F]\\s*=\\s*\'[^\']*\'.*")) {
pattern = Pattern.compile("[h|H][r|R][e|E][f|F]\\s*=\\s*\'[^\"]+\'");
Matcher matcher = pattern.matcher(anchor);
while(matcher.find()){
list.add(matcher.group());
}
continue;
}
//<a href=www.baidu.com>
if(anchor.matches(".*[h|H][r|R][e|E][f|F]\\s*=\\s*[^\\s]*.*")) {
pattern = Pattern.compile("[h|H][r|R][e|E][f|F]\\s*=\\s*[^\\s]+[\\s+|>]");
Matcher matcher = pattern.matcher(anchor);
while(matcher.find()){
String str = matcher.group();
list.add(str.substring(0, str.length()-1));
}
continue;
}
}
return list;
}
public List<String> getUrls() {
List<String> hrefs = getHrefs();
List<String> urls = new ArrayList<String>();
for(String href:hrefs) {
if(href.length() <= 7) {
continue;
}
urls.add(href.substring(6, href.length()-1));
}
return urls;
}
public static void main(String[] args) throws Exception{
List<String> list = new FetchLinks("http://sports.sina.com.cn/nba/").getUrls();
for(String str:list){
System.out.println(str);
}
}
}
编译级别需jdk1.5及以上
FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_LOAD(196)
ERROR: transport library not found: dt_socket
ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_LOAD(509)
JDWP exit error AGENT_ERROR_TRANSPORT_LOAD(196): No transports initialized [../../../src/share/back/debugInit.c:690]我用的eclipse
类型 List 不是通用的;不能使用参数 <String> 将它参数化
语法错误,仅当源级别为 5.0 时已参数化的类型才可用
没有为类型 FetchLinks 定义方法 getUrls()
语法错误,仅当源级别为 5.0 时“for each”语句才可用 at practice.FetchLinks.main(FetchLinks.java:116)