<a class='titleLength' href='showArticle.asp?id=2189' title='联系表' >联系表</a>
<a class='titleLength' href='showDeclaration.asp?id=1375' title='通知' >通知' </a>
<a class='titleLength' href='showNews.asp?id=4919' title='代表' >代表</a>
<a target='_blank' class='titleLength' href='showNews.asp?id=4918' title='工作会(图)' >工作会(图)</a>
<a class='titleLength' href='showDeclaration.asp?id=1735' title='关于' >关于</a> 这是要匹配的URL,我写的表达式:
<a\\s+.*class='titleLength'\\s+href=\\\'([a-z]*\\.asp\\?id=\\d+(\\&depart=\\d+)?\\\'>(.*?)<\\/a>经过测试不对请大家指点一下,谢谢!
<a class='titleLength' href='showDeclaration.asp?id=1375' title='通知' >通知' </a>
<a class='titleLength' href='showNews.asp?id=4919' title='代表' >代表</a>
<a target='_blank' class='titleLength' href='showNews.asp?id=4918' title='工作会(图)' >工作会(图)</a>
<a class='titleLength' href='showDeclaration.asp?id=1735' title='关于' >关于</a> 这是要匹配的URL,我写的表达式:
<a\\s+.*class='titleLength'\\s+href=\\\'([a-z]*\\.asp\\?id=\\d+(\\&depart=\\d+)?\\\'>(.*?)<\\/a>经过测试不对请大家指点一下,谢谢!
解决方案 »
- java 题目 编写一个代表地址的Address类,地址信息由国家,省份,城市,街道,邮编组成,并且可以返回完整的地址信息!并用测试类 进行测试!
- 一个 vachar在utf-8编码下占用几个字节数?
- 开源Java扫雷游戏JMine1.2.5Jar新版(Swing技术)
- 分解质因数
- 请教在控件里绘图问题
- 我也问个采鸟问题?
- 请问:interface excepted here这个错误该怎么办啊?
- applet只能显示字符串?
- 哪里有地方免费下载《Java 2核心技术 卷I:基础知识》????
- Apache 1.3.20和Tomcat 4能否结合起来?
- is it look good?
- 关于paintComponent方法
import java.util.regex.Pattern;
public class RegexTest {
public static void main(String[] args) {
String str="<a class='titleLength' href='showArticle.asp?id=2189' title='联系表' >联系表 </a>" +
"<a class='titleLength' href='showDeclaration.asp?id=1375' title='通知' >通知' </a>" +
"<a class='titleLength' href='showNews.asp?id=4919' title='代表' >代表 </a>" +
"<a target='_blank' class='titleLength' href='showNews.asp?id=4918' title='工作会(图)' >工作会(图) </a>" +
"<a class='titleLength' href='showDeclaration.asp?id=1735' title='关于' >关于 </a> ";
String regex="<a\\s*(target='.*')?\\s*class='titleLength'\\s*href='\\w*\\.asp\\?id=\\d{4}'\\s*title='[\u4E00-\u9FA5]*?(([\u4E00-\u9FA5]))?'\\s*>[\u4E00-\u9FA5]*?(([\u4E00-\u9FA5]))?'?\\s*</a>";
Pattern p=Pattern.compile(regex);
Matcher m=p.matcher(str);
while(m.find()){
System.out.println(m.group());
}
}}
测试结果:<a class='titleLength' href='showArticle.asp?id=2189' title='联系表' >联系表 </a>
<a class='titleLength' href='showDeclaration.asp?id=1375' title='通知' >通知' </a>
<a class='titleLength' href='showNews.asp?id=4919' title='代表' >代表 </a>
<a target='_blank' class='titleLength' href='showNews.asp?id=4918' title='工作会(图)' >工作会(图) </a>
<a class='titleLength' href='showDeclaration.asp?id=1735' title='关于' >关于 </a>
正则里面(图)的括号,是全角下输入的括号
String regex="<a\\s*(target='.*')?\\s*class='titleLength'\\s*href='\\w*\\.asp\\?id=\\d{4}'\\s*title='[\u4E00-\u9FA5]*?(([\u4E00-\u9FA5]))?'\\s*>" +
"[\u4E00-\u9FA5]*?(([\u4E00-\u9FA5]))?'?\\s*</a>";
这个正则匹配你给的那5个是没问题,是根据那5个的格式来的
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class ParseHtml { public static void main(String[] args) {
String html = "<a class='titleLength' href='showArticle.asp?id=2189' title='联系表' >联系表 </a>"+
"<a class='titleLength' href='showDeclaration.asp?id=1375' title='通知' >通知' </a> "+
"<a class='titleLength' href='showNews.asp?id=4919' title='代表' >代表 </a> "+
"<a target='_blank' class='titleLength' href='showNews.asp?id=4918&depart=542' title='工作会(图)' >工作会(图) </a> "+
"<a class='titleLength' href='showDeclaration.asp?id=1735' title='关于' >关于 </a> "+
"<a href='showdeclaration.asp?sid=5424'>位置</a>";
String pattern = "(?<=href=\\')(\\w+?\\.asp\\?id=\\d+(\\&depart=\\d+)?)(?=\\')";
Matcher matcher = Pattern.compile(pattern).matcher(html);
while(matcher.find()){
System.out.println(matcher.group());
}
}}