关于Java正则表达式的嵌套问题

import java.util.regex.*;public class TestRegexReplace
{
public static void main(String[] arg)
{
String s = "<a b=\"1\"> <a c=\"1\">ddd </a> </a>";
String pat = "<a[^a]+a>";
Pattern p = Pattern.compile(pat);
Matcher m = p.matcher(s); String patTmp = "a[^a|\\=]+\\=";
Pattern pTmp = Pattern.compile(patTmp);
Matcher mTmp;
String tmp = ""; String result = "";
String resultNew = "";

while (m.find())
{
result = m.group();
mTmp = pTmp.matcher(result);
if (mTmp.find())
{
tmp = mTmp.group();
tmp = tmp.substring(2, tmp.length() - 1);
resultNew = result.replace("a ", "").replace("</a>", "</" + tmp + ">");
s = s.replace(result, resultNew);
m = p.matcher(s);
}
}
System.out.println(s);
}
}

String pat = "<a[^a]+a>";
我想请这位大侠解释下这段正则所表达的意思~~~~

大侠，我给出的例子是可能有点片面，如果
<abc bcd="1"> <abc cde="1">ddd </abc> </abc>要替换成 <bcd="1"> <cde="1">ddd </cde> </bcd>
又该怎么写，谢谢

大侠的代码还是有点问题，当字符串是<a b=\"1\"> <a c=\"1\">a </a> </a>,得到的是<a b="1"> <a c="1">a </a> </a>

为会么非得用正则呢，这种情况可以考虑用XML进行解析
===============================
正则的参考,可能有些情况也没有考虑到
String regex = "<(\\w+)\\s(\\w+)=[^>]*>";
StringBuilder input = new StringBuilder("<abc bcd=\"1\"> <abc cde=\"1\">ddd </abc> </abc>");
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
while (m.find()) {
int start = m.start(1);
int end = m.end(1);
int index = input.lastIndexOf("/" + m.group(1)) + 1;
input.delete(index, index + end - start);//先删除后面的元素防止前面的位置出现错误
input.insert(index, m.group(2));//填入新的元素
input.delete(start, end + 1);// 去除原来的结点名
m.reset();
}
System.out.println(input);

谢谢你的回答
当字符串是<abc bcd=\"1\"> <abc cde=\"1\">ddd </abc> </abc><abc cde=\"2\">ddd </abc>，会得到
<bcd="1"> <cde="1">ddd </="2> </cde><cde="2">ddd </bcd>
也有点小问题

既然是这样,那就更应该用xml了

import java.util.regex.*;public class TestRegexReplace
{
public static void main(String[] arg)
{
//String s = "<a b=\"1\"> <a c=\"1\">ddd </a> </a>";
//String s = "<a b=\"1\"> <a c=\"1\">a </a> </a>";
String s = "<abc bcd=\"1\"> <abc cde=\"1\">abc </abc> </abc>"; String patTag = "<[a-zA-Z]+[ ]";
Pattern pTag = Pattern.compile(patTag);
Matcher mTag = pTag.matcher(s);
if (mTag.find())
{
String tag = mTag.group();
tag = tag.substring(1, tag.length() - 1); String pat = "<" + tag + ".+" + tag + ">";
Pattern p = Pattern.compile(pat);
Matcher m = p.matcher(s); String patTmp = tag + "[ ][a-zA-Z]+\\=";
Pattern pTmp = Pattern.compile(patTmp);
Matcher mTmp;
String tmp = ""; String result = "";
String resultNew = ""; while (m.find())
{
result = m.group(); mTmp = pTmp.matcher(result);
if (mTmp.find())
{
tmp = mTmp.group();
tmp = tmp.substring(tag.length() + 1, tmp.length() - 1);
resultNew = "<" + result.substring(tag.length() + 2, result.length() - tag.length() - 1) + tmp + ">";
s = s.replace(result, resultNew);
m = p.matcher(s);
}
}
System.out.println(s);
}
}
}

import java.util.regex.*;public class TestRegexReplace
{
public static void main(String[] arg)
{
//String s = "<a b=\"1\"> <a c=\"1\">ddd </a> </a>";
//String s = "<a b=\"1\"> <a c=\"1\">a </a> </a>";
//String s = "<abc bcd=\"1\"> <abc cde=\"1\">abc </abc> </abc>";
String s = "<abc bcd=\"1\"> <abc cde=\"1\">ddd </abc> </abc> <abc cde=\"2\">ddd </abc> <abc cde=\"2\">abc </abc>"; String patTag = "<[a-zA-Z]+[ ]";
Pattern pTag = Pattern.compile(patTag);
Matcher mTag = pTag.matcher(s);
if (mTag.find())
{
String tag = mTag.group();
tag = tag.substring(1, tag.length() - 1);
String[] sub = s.replace("> ", ">").replace(" <", "<").split(tag + "><" + tag);
sub[0] = sub[0] + tag + ">";
for (int i = 1; i < sub.length - 1; i++)
{
sub[i] = "<" + tag + sub[i] + tag + ">";
}
sub[sub.length - 1] = "<" + tag + sub[sub.length - 1]; for (int i = 0; i < sub.length; i++)
{
String pat = "<" + tag + ".+" + tag + ">";
Pattern p = Pattern.compile(pat);
Matcher m = p.matcher(sub[i]); String patTmp = tag + "[ ][a-zA-Z]+\\=";
Pattern pTmp = Pattern.compile(patTmp);
Matcher mTmp;
String tmp = ""; String result = "";
String resultNew = ""; while (m.find())
{
result = m.group(); mTmp = pTmp.matcher(result);
if (mTmp.find())
{
tmp = mTmp.group();
tmp = tmp.substring(tag.length() + 1, tmp.length() - 1);
resultNew = "<" + result.substring(tag.length() + 2, result.length() - tag.length() - 1) + tmp + ">";
sub[i] = sub[i].replace(result, resultNew);
m = p.matcher(sub[i]);
}
}
System.out.print(sub[i]);
}
}
}
}

tlowl大侠的正则表达式功力太深厚了，
上面的正则表达式
String patTag = "<[a-zA-Z]+[ ]";
String patTmp = tag + "[ ][a-zA-Z]+\\=";
不知是何意，如何可以的话，盼能解释一下。
上面给出的代码已经很完美、很完善了，刚才我试了另一个字符串发现还是有点小问题，
String s = <abc bcd=\"1\">1<abc cde=\"1\">ddd</abc>2</abc>3<abc cde=\"2\">ddd</abc>4<abc cde=\"2\">abc</abc>5
得出的结果是：<bcd="1">1<cde="1">ddd</abc>2</abc>3<abc cde="2">ddd</cde>4<abc cde="2">abc</bcd>5bcd>
通过解读tlowl大侠上面的代码，发现问题是出在字符串中不存在“><”所致。凡事不可能考虑的面面俱到，感谢tlowl大侠的认真仔细的回答，在此深表感谢！

import java.util.regex.*;public class TestRegexReplace
{
//取 abc>3<abc 中间的 3 这样的字符串
public static String getLast(Matcher m, String tag)
{
String last = "";
if (m.find())
{
last = m.group();
last = last.substring(tag.length() + 1, last.length() - tag.length() - 1);
}
return last;
} public static void main(String[] arg)
{
String s = "<abc bcd=\"1\">1 <abc cde=\"1\">ddd </abc>2 </abc>3 <abc cde=\"2\">ddd </abc>4 <abc cde=\"2\">abc </abc>5";
String last = ""; s = s.replace("> ", ">").replace(" <", "<");
String patTag = "<[a-zA-Z]+[ ]";//取 <abc bcd= 中的 <abc 的正则表达式。空格是特殊字符，需要用中括号括起来
Pattern pTag = Pattern.compile(patTag);
Matcher mTag = pTag.matcher(s);
if (mTag.find())
{
String tag = mTag.group();
tag = tag.substring(1, tag.length() - 1);

String strSplit = tag + ">[^<|>]*<" + tag;
Pattern pSplit = Pattern.compile(strSplit);
Matcher mSplit = pSplit.matcher(s);

String[] sub = s.split(strSplit); last = getLast(mSplit, tag);
sub[0] = sub[0] + tag + ">" + last;
for (int i = 1; i < sub.length - 1; i++)
{
last = getLast(mSplit, tag);
sub[i] = "<" + tag + sub[i] + tag + ">" + last;
}
sub[sub.length - 1] = "<" + tag + sub[sub.length - 1]; for (int i = 0; i < sub.length; i++)
{
String pat = "<" + tag + ".+" + tag + ">";
Pattern p = Pattern.compile(pat);
Matcher m = p.matcher(sub[i]); String patTmp = tag + "[ ][a-zA-Z]+\\=";//取 <abc bcd= 中的 bcd= 的正则表达式。同样，空格是特殊字符，需要用中括号括起来
Pattern pTmp = Pattern.compile(patTmp);
Matcher mTmp;
String tmp = ""; String result = "";
String resultNew = ""; while (m.find())
{
result = m.group(); mTmp = pTmp.matcher(result);
if (mTmp.find())
{
tmp = mTmp.group();
tmp = tmp.substring(tag.length() + 1, tmp.length() - 1);
resultNew = "<" + result.substring(tag.length() + 2, result.length() - tag.length() - 1) + tmp + ">";
sub[i] = sub[i].replace(result, resultNew);
m = p.matcher(sub[i]);
}
}
System.out.print(sub[i]);
}
}
}
}
你给出的例子是可以实现了，感觉你这个就是要替换html或者xml中的tag。但是有个前提就是，第一层的tag必须统一，也就是这个例子里面的abc那样的tag必须都是一样的。

还是存在以前的问题，当字符串中不存在abc>[^<|>]*<abc的时候，得到的结果不对

  String patTmp = "a[^a|\\=]+\\=";
        Pattern pTmp = Pattern.compile(patTmp);
        Matcher mTmp;
        String tmp = "";
for (int i = 1; i < sub.length - 1; i++)
            {
                last = getLast(mSplit, tag);
                sub[i] = "<" + tag + sub[i] + tag + ">" + last;
            }
            sub[sub.length - 1] = "<" + tag + sub[sub.length - 1];

我已经说过了这段代码的前提是你第一层的tag必须统一，这段代码中没有办法统计出来第一层中的每个tag
这一点昨天已经想到了，这两天会抽个时间再改善一下。需要改一下做法，改善之后应该能满足你的要求。呵呵，除非是你又加了新的需求
另外抱歉的一点是，我在公司里面没法用聊天软件

我比较笨,所以用最笨的方法,但处理问题最直接!
public static void main(String[] args) {
String str = "<a b=\"1\"> <a c=\"1\">ddd </a> </a>";//<b="1"> <c="1">ddd </c> </b>
str = str.replaceAll("(<)((a) (b))(=\"1\"> <)((a) (c))(=\"1\">ddd </)(a)(> </)(a)(>)","$1$4$5$8$9$8$11$4$13");
System.out.println(str);
}

楼主可以把它想成:把一个整体打成碎片,再重新组合成自己想要的一个新的整体.
这样处理比较灵活,
语法中,每个"("为一个结点,java中称为组,索引从1开始,$1表示为第一组,
比如:(s)(a(bcd)) (s)为第一组,)(a(bcd))为第二组,(bcd)为第三组,就是说见到的第一个"("为一个结点

抱歉，最近工作太忙，回帖晚了。
按照楼主后来给的例子，始终感觉这是解析html或者xml的tag。如果是这样的话，那么难点就在于如何用tag来拆分这个字符串。后来我又仔细的考虑了一下，如果单纯的用正则表达式的话，以我目前所掌握的程度没有办法完美的拆分这个字符串。因为可以说所有的字符都可以出现在html或者xml中，对于一个tag元素来说，我就没有办法确定哪些字符不会在这个tag中出现，也就没有办法界定这个tag元素的两个边界。就像这样 >[^<|>]*< 中的 > 和 < 是两个边界，而 [^<|>]* 取得是他们中间的字符。当然如果楼主能给出一些限制的话，我想还是会有办法用正则表达式来拆分这个字符串的。
另外想到的一点是，用xml解析也许是一个好办法。但是到目前为止我没有找到Java里面直接读取xml格式字符串转化成XmlDocument的函数（C#里面有），只能用数据流来操作了。亦或者用最原始的方法，将这个字符串从头到尾走一边，判断每一个字符然后拆分成一个包含所有子元素的数组。就像四则运算里面拆分括号里面的内容一样。基本思路就是这样，但是不管哪一种方法处理起来都会比较复杂。最近可能没有时间去实现这些代码了

这么复杂的一个替换, 大家都在用正则表达式匹配, 这样的效率是相当低的,正则表达式的应用场景:1. 匹配目标精确, 不要动不动拿一个文件, 就用一个正则表达式匹配, 这样效率相当地下,
正确做法式, 细分, 细分到不超过长度为100的String 类型.2. 尽量其它方法将匹配的目标缩小, 如xml解析,以及其它文本解析的方法, String提供的基本API

java 可以用正则表达式:
String[] strs = new String[] {
"<a b=\"1\"> <a c=\"1\">ddd </a> </a>",
"<abc bcd=\"1\"> <abc cde=\"1\">ddd </abc> </abc>",
"<a b=\\\"1\\\"> <a c=\\\"1\\\">a </a> </a>"
}; for (String str : strs) {
System.out.println("before: " + str);
str = str.replaceAll("^(<\\w+ +)((\\w+)=[^>=]+> )(<\\w+ +)((\\w+)=[^>=]+>[^<]+)(<\\/\\w+>)([^<]+)(<\\/\\w+>)",
"<$2<$5</$3>$8</$6>");
System.out.println("after: " + str);
}结果如下:before: <a b="1"> <a c="1">ddd </a> </a>
after: <b="1"> <c="1">ddd </b> </c>
before: <abc bcd="1"> <abc cde="1">ddd </abc> </abc>
after: <bcd="1"> <cde="1">ddd </bcd> </cde>
before: <a b=\"1\"> <a c=\"1\">a </a> </a>
after: <b=\"1\"> <c=\"1\">a </b> </c>

楼主提的好问题啊,帮我解决难题 ..
tlowl 高人膜拜中...

调试易

关于Java正则表达式的嵌套问题

解决方案 »