url可能存在情况:
<a href="detail.html?modelId=22&infoId=10" >
<a href="detail_house.html?modelId=22&infoId=10" >
<a href="news_detail.html?modelId=22&infoId=10" >
<a href="detail_house.html?infoId=10&modelId=22" >规律说明:
1.肯定会有detail字样,但前后可能有其它字符如 news 或house
3肯定会有modelId肯infoid参数,但顺序可能不一样 可能是 modelId=22&infoId=10 也可能是infoId=10&modelId=22目的,做如下转发:href="原detail页-modelid-infoid.html"如:
<a href="detail.html?modelId=22&infoId=10" >
= >
<a href="detail-22-10.html" >
<a href="detail_house.html?modelId=22&infoId=10" >
= >
<a href="detail_house-22-10.html" >
<a href="news_detail.html?modelId=22&infoId=10" >
= >
<a href="detail_house-22-10.html" > <a href="detail_house.html?infoId=10&modelId=22" >
= >
<a href="detail_house-22-10.html" > 顺序也是-22-10
<a href="detail.html?modelId=22&infoId=10" >
<a href="detail_house.html?modelId=22&infoId=10" >
<a href="news_detail.html?modelId=22&infoId=10" >
<a href="detail_house.html?infoId=10&modelId=22" >规律说明:
1.肯定会有detail字样,但前后可能有其它字符如 news 或house
3肯定会有modelId肯infoid参数,但顺序可能不一样 可能是 modelId=22&infoId=10 也可能是infoId=10&modelId=22目的,做如下转发:href="原detail页-modelid-infoid.html"如:
<a href="detail.html?modelId=22&infoId=10" >
= >
<a href="detail-22-10.html" >
<a href="detail_house.html?modelId=22&infoId=10" >
= >
<a href="detail_house-22-10.html" >
<a href="news_detail.html?modelId=22&infoId=10" >
= >
<a href="detail_house-22-10.html" > <a href="detail_house.html?infoId=10&modelId=22" >
= >
<a href="detail_house-22-10.html" > 顺序也是-22-10
我目前成果
String pattern= "href=\"((.*?)detail(.*?))\\.html(.*?)modelId=(\\d+)(.*?)\"";
Pattern p = Pattern.compile(pattern, 2 | Pattern.DOTALL);
Matcher m = p.matcher(content);
String newPattern= "href=\"$1-$5.html\"";
String newContent =null;
if(m.find()) {
newContent = m.replaceAll(newPattern);
}
System.out.println(newContent);这样 modelid 和infoId 顺序不能处理
= >
<a href="detail_house-22-10.html" > 顺序也是-22-10
----------------
如果像你所说,infoId和modelId顺序不固定,
那detail_house-22-10.html也可能为detail_house-10-22.html,如何区分?
建议你搜下url正则
content=String.substring(content,0,content.charAt('?'))+"\">";
String str = "<a href=\"detail.html?infoId=10&modelId=22\" >";
System.out.println(temp(str));
str = "<a href=\"detail_house.html?modelId=22&infoId=10\" >";
System.out.println(temp(str));
str = "<a href=\"news_detail.html?modelId=22&infoId=10\" >";
System.out.println(temp(str));
str = "<a href=\"detail_house.html?infoId=10&modelId=22\" >";
System.out.println(temp(str));
}
public static String temp(String str){
String reg = "^\\<a\\s+href=\"(news_)?detail(_house)?\\.html\\?((modelId=\\d+)|(infoId=\\d+))&((modelId=\\d+)|(infoId=\\d+))\"\\s*\\>$";
StringBuffer sb = null;
Matcher matcher = Pattern.compile(reg).matcher(str);
while(matcher.find()){
sb = new StringBuffer();
sb.append("<a href=\"detail");
sb.append(matcher.group(1) == null ? "" : "_news");
sb.append(matcher.group(2) == null ? "" : "_house");
sb.append(matcher.group(3).startsWith("modelId") ? matcher.group(3).replaceAll("^.*?(\\d+)$", "-$1") : matcher.group(6).replaceAll("^.*?(\\d+)$", "-$1"));
sb.append(matcher.group(6).startsWith("infoId") ? matcher.group(6).replaceAll("^.*?(\\d+)$", "-$1") : matcher.group(3).replaceAll("^.*?(\\d+)$", "-$1"));
sb.append(".html\">");
}
return sb.toString();
}
* */
public static String detailUrlFilter(String content){
String pattern= "href=\"((.*?)detail(.*?))\\.html\\?modelId=(\\d+)\\&infoId=(\\d+)\"";
Pattern p = Pattern.compile(pattern, 2 | Pattern.DOTALL);
Matcher m = p.matcher(content);
String newPattern= "href=\"$1-$4-$5.html\"";
String newContent =content;
if(m.find()) {
newContent = m.replaceAll(newPattern);
}
pattern= "href=\"((.*?)detail(.*?))\\.html\\?infoId=(\\d+)\\&modelId=(\\d+)\"";
p = Pattern.compile(pattern, 2 | Pattern.DOTALL);
m = p.matcher(newContent);
newPattern= "href=\"$1-$5-$4.html\"";
if(m.find()) {
newContent = m.replaceAll(newPattern);
}
return newContent;
}