原贴和内容:http://community.csdn.net/Expert/TopicView3.asp?id=4341727
在html里面把<a ... href="url" ... >替换成<a ... href="newUrl?address=url" ... >String pat = "(<a\\s+([^>h]|h(?!ref\\b))*href=\")([^\"]*)(\"[^>]*>)";
String html = "aaaa<a jfkdfd d fds sa href=\"sssss\" fds afjdslfd > aaa<a href=\"aaa\">";System.out.println(html.replaceAll(pat, "$1newUrl?address=$3$4"));
=======================================
上面的已解决,还有一个问题
我想判断$3是不是http开头的,是的话newUrl可能不同,怎么做?谢了。
在html里面把<a ... href="url" ... >替换成<a ... href="newUrl?address=url" ... >String pat = "(<a\\s+([^>h]|h(?!ref\\b))*href=\")([^\"]*)(\"[^>]*>)";
String html = "aaaa<a jfkdfd d fds sa href=\"sssss\" fds afjdslfd > aaa<a href=\"aaa\">";System.out.println(html.replaceAll(pat, "$1newUrl?address=$3$4"));
=======================================
上面的已解决,还有一个问题
我想判断$3是不是http开头的,是的话newUrl可能不同,怎么做?谢了。
String pat = "(<a\\s+([^>h]|h(?!ref\\b))*href=\")((?!http://)[^\"]*)(\"[^>]*>)";其他的一样。
==> $1newUrl_1?address=$3$4代码:
String pat = "(<a\\s+([^>h]|h(?!ref\\b))*href=\")((?!http://)[^\"]*)(\"[^>]*>)";只替换 是 http:// 开头的(<a\s+([^>h]|h(?!ref\b))*href=")(http://[^"]*)("[^>]*>)
==> $1newUrl_2?address=$3$4代码:
String pat = "(<a\\s+([^>h]|h(?!ref\\b))*href=\")(http://[^\"]*)(\"[^>]*>)";
综上,我使用了 4 次 replaceAll 来达到目的,分别是(1)有引号有http,(2)有引号无http,(3)无引号有http,(4)无引号无http。我的办法可能比较笨,如果有高人提供更简单的办法,请赐教,不胜感激。
/* 表达式:(临时标记 temp_re:// 是为了防止干扰)(<a\s+([^>h]|h(?!ref\b))*href=("|'))(http://.*?)(\3[^>]*>)
==> $1temp_re://url2?address=$4$5(<a\s+([^>h]|h(?!ref\b))*href=("|'))((?!temp_re://).*?)(\3[^>]*>)
==> $1url1?address=$4$5
(<a\s+([^>h]|h(?!ref\b))*href=(?!"|'))(http://[^\s>]*)([^>]*>)
==> $1temp_re://url2?address=$3$4(<a\s+([^>h]|h(?!ref\b))*href=(?!"|'))((?!temp_re://)[^\s>]*)([^>]*>)
==> $1url1?address=$3$4
temp_re://
==> ""*/
String html = "a<a href=http://aaaa>aa<a aaa href=bb aaaa>a<a jfkdfd d fds sa href=\"sssss\" fds afjdslfd > aaa<a href=\"http://aaa\"> aaa<a href=\"http://aaa\">";String pat1 = "(<a\\s+([^>h]|h(?!ref\\b))*href=(\"|'))(http://.*?)(\\3[^>]*>)";
html = html.replaceAll(pat1, "$1temp_re://url2?address=$4$5");String pat2 = "(<a\\s+([^>h]|h(?!ref\\b))*href=(\"|'))((?!temp_re://).*?)(\\3[^>]*>)";
html = html.replaceAll(pat2, "$1url1?address=$4$5");String pat3 = "(<a\\s+([^>h]|h(?!ref\\b))*href=(?!\"|'))(http://[^\\s>]*)([^>]*>)";
html = html.replaceAll(pat3, "$1temp_re://url2?address=$3$4");String pat4 = "(<a\\s+([^>h]|h(?!ref\\b))*href=(?!\"|'))((?!temp_re://)[^\\s>]*)([^>]*>)";
html = html.replaceAll(pat4, "$1url1?address=$3$4");// 最后一步是为了把防治干扰的临时标记去掉
html = html.replaceAll("temp_re://", "");System.out.println(html);