为会么非得用正则呢,这种情况可以考虑用XML进行解析 =============================== 正则的参考,可能有些情况也没有考虑到 String regex = "<(\\w+)\\s(\\w+)=[^>]*>"; StringBuilder input = new StringBuilder("<abc bcd=\"1\"> <abc cde=\"1\">ddd </abc> </abc>"); Pattern p = Pattern.compile(regex); Matcher m = p.matcher(input); while (m.find()) { int start = m.start(1); int end = m.end(1); int index = input.lastIndexOf("/" + m.group(1)) + 1; input.delete(index, index + end - start);//先删除后面的元素防止前面的位置出现错误 input.insert(index, m.group(2));//填入新的元素 input.delete(start, end + 1);// 去除原来的结点名 m.reset(); } System.out.println(input);
{
public static void main(String[] arg)
{
String s = "<a b=\"1\"> <a c=\"1\">ddd </a> </a>";
String pat = "<a[^a]+a>";
Pattern p = Pattern.compile(pat);
Matcher m = p.matcher(s); String patTmp = "a[^a|\\=]+\\=";
Pattern pTmp = Pattern.compile(patTmp);
Matcher mTmp;
String tmp = ""; String result = "";
String resultNew = "";
while (m.find())
{
result = m.group();
mTmp = pTmp.matcher(result);
if (mTmp.find())
{
tmp = mTmp.group();
tmp = tmp.substring(2, tmp.length() - 1);
resultNew = result.replace("a ", "").replace("</a>", "</" + tmp + ">");
s = s.replace(result, resultNew);
m = p.matcher(s);
}
}
System.out.println(s);
}
}
我想请这位大侠解释下这段正则所表达的意思~~~~
<abc bcd="1"> <abc cde="1">ddd </abc> </abc>要替换成 <bcd="1"> <cde="1">ddd </cde> </bcd>
又该怎么写,谢谢
===============================
正则的参考,可能有些情况也没有考虑到
String regex = "<(\\w+)\\s(\\w+)=[^>]*>";
StringBuilder input = new StringBuilder("<abc bcd=\"1\"> <abc cde=\"1\">ddd </abc> </abc>");
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
while (m.find()) {
int start = m.start(1);
int end = m.end(1);
int index = input.lastIndexOf("/" + m.group(1)) + 1;
input.delete(index, index + end - start);//先删除后面的元素防止前面的位置出现错误
input.insert(index, m.group(2));//填入新的元素
input.delete(start, end + 1);// 去除原来的结点名
m.reset();
}
System.out.println(input);
当字符串是<abc bcd=\"1\"> <abc cde=\"1\">ddd </abc> </abc><abc cde=\"2\">ddd </abc>,会得到
<bcd="1"> <cde="1">ddd </="2> </cde><cde="2">ddd </bcd>
也有点小问题
{
public static void main(String[] arg)
{
//String s = "<a b=\"1\"> <a c=\"1\">ddd </a> </a>";
//String s = "<a b=\"1\"> <a c=\"1\">a </a> </a>";
String s = "<abc bcd=\"1\"> <abc cde=\"1\">abc </abc> </abc>"; String patTag = "<[a-zA-Z]+[ ]";
Pattern pTag = Pattern.compile(patTag);
Matcher mTag = pTag.matcher(s);
if (mTag.find())
{
String tag = mTag.group();
tag = tag.substring(1, tag.length() - 1); String pat = "<" + tag + ".+" + tag + ">";
Pattern p = Pattern.compile(pat);
Matcher m = p.matcher(s); String patTmp = tag + "[ ][a-zA-Z]+\\=";
Pattern pTmp = Pattern.compile(patTmp);
Matcher mTmp;
String tmp = ""; String result = "";
String resultNew = ""; while (m.find())
{
result = m.group(); mTmp = pTmp.matcher(result);
if (mTmp.find())
{
tmp = mTmp.group();
tmp = tmp.substring(tag.length() + 1, tmp.length() - 1);
resultNew = "<" + result.substring(tag.length() + 2, result.length() - tag.length() - 1) + tmp + ">";
s = s.replace(result, resultNew);
m = p.matcher(s);
}
}
System.out.println(s);
}
}
}
{
public static void main(String[] arg)
{
//String s = "<a b=\"1\"> <a c=\"1\">ddd </a> </a>";
//String s = "<a b=\"1\"> <a c=\"1\">a </a> </a>";
//String s = "<abc bcd=\"1\"> <abc cde=\"1\">abc </abc> </abc>";
String s = "<abc bcd=\"1\"> <abc cde=\"1\">ddd </abc> </abc> <abc cde=\"2\">ddd </abc> <abc cde=\"2\">abc </abc>"; String patTag = "<[a-zA-Z]+[ ]";
Pattern pTag = Pattern.compile(patTag);
Matcher mTag = pTag.matcher(s);
if (mTag.find())
{
String tag = mTag.group();
tag = tag.substring(1, tag.length() - 1);
String[] sub = s.replace("> ", ">").replace(" <", "<").split(tag + "><" + tag);
sub[0] = sub[0] + tag + ">";
for (int i = 1; i < sub.length - 1; i++)
{
sub[i] = "<" + tag + sub[i] + tag + ">";
}
sub[sub.length - 1] = "<" + tag + sub[sub.length - 1]; for (int i = 0; i < sub.length; i++)
{
String pat = "<" + tag + ".+" + tag + ">";
Pattern p = Pattern.compile(pat);
Matcher m = p.matcher(sub[i]); String patTmp = tag + "[ ][a-zA-Z]+\\=";
Pattern pTmp = Pattern.compile(patTmp);
Matcher mTmp;
String tmp = ""; String result = "";
String resultNew = ""; while (m.find())
{
result = m.group(); mTmp = pTmp.matcher(result);
if (mTmp.find())
{
tmp = mTmp.group();
tmp = tmp.substring(tag.length() + 1, tmp.length() - 1);
resultNew = "<" + result.substring(tag.length() + 2, result.length() - tag.length() - 1) + tmp + ">";
sub[i] = sub[i].replace(result, resultNew);
m = p.matcher(sub[i]);
}
}
System.out.print(sub[i]);
}
}
}
}
上面的正则表达式
String patTag = "<[a-zA-Z]+[ ]";
String patTmp = tag + "[ ][a-zA-Z]+\\=";
不知是何意,如何可以的话,盼能解释一下。
上面给出的代码已经很完美、很完善了,刚才我试了另一个字符串发现还是有点小问题,
String s = <abc bcd=\"1\">1<abc cde=\"1\">ddd</abc>2</abc>3<abc cde=\"2\">ddd</abc>4<abc cde=\"2\">abc</abc>5
得出的结果是:<bcd="1">1<cde="1">ddd</abc>2</abc>3<abc cde="2">ddd</cde>4<abc cde="2">abc</bcd>5bcd>
通过解读tlowl大侠上面的代码,发现问题是出在字符串中不存在“><”所致。凡事不可能考虑的面面俱到,感谢tlowl大侠的认真仔细的回答,在此深表感谢!
{
//取 abc>3<abc 中间的 3 这样的字符串
public static String getLast(Matcher m, String tag)
{
String last = "";
if (m.find())
{
last = m.group();
last = last.substring(tag.length() + 1, last.length() - tag.length() - 1);
}
return last;
} public static void main(String[] arg)
{
String s = "<abc bcd=\"1\">1 <abc cde=\"1\">ddd </abc>2 </abc>3 <abc cde=\"2\">ddd </abc>4 <abc cde=\"2\">abc </abc>5";
String last = ""; s = s.replace("> ", ">").replace(" <", "<");
String patTag = "<[a-zA-Z]+[ ]";//取 <abc bcd= 中的 <abc 的正则表达式。空格是特殊字符,需要用中括号括起来
Pattern pTag = Pattern.compile(patTag);
Matcher mTag = pTag.matcher(s);
if (mTag.find())
{
String tag = mTag.group();
tag = tag.substring(1, tag.length() - 1);
String strSplit = tag + ">[^<|>]*<" + tag;
Pattern pSplit = Pattern.compile(strSplit);
Matcher mSplit = pSplit.matcher(s);
String[] sub = s.split(strSplit); last = getLast(mSplit, tag);
sub[0] = sub[0] + tag + ">" + last;
for (int i = 1; i < sub.length - 1; i++)
{
last = getLast(mSplit, tag);
sub[i] = "<" + tag + sub[i] + tag + ">" + last;
}
sub[sub.length - 1] = "<" + tag + sub[sub.length - 1]; for (int i = 0; i < sub.length; i++)
{
String pat = "<" + tag + ".+" + tag + ">";
Pattern p = Pattern.compile(pat);
Matcher m = p.matcher(sub[i]); String patTmp = tag + "[ ][a-zA-Z]+\\=";//取 <abc bcd= 中的 bcd= 的正则表达式。同样,空格是特殊字符,需要用中括号括起来
Pattern pTmp = Pattern.compile(patTmp);
Matcher mTmp;
String tmp = ""; String result = "";
String resultNew = ""; while (m.find())
{
result = m.group(); mTmp = pTmp.matcher(result);
if (mTmp.find())
{
tmp = mTmp.group();
tmp = tmp.substring(tag.length() + 1, tmp.length() - 1);
resultNew = "<" + result.substring(tag.length() + 2, result.length() - tag.length() - 1) + tmp + ">";
sub[i] = sub[i].replace(result, resultNew);
m = p.matcher(sub[i]);
}
}
System.out.print(sub[i]);
}
}
}
}
你给出的例子是可以实现了,感觉你这个就是要替换html或者xml中的tag。但是有个前提就是,第一层的tag必须统一,也就是这个例子里面的abc那样的tag必须都是一样的。
Pattern pTmp = Pattern.compile(patTmp);
Matcher mTmp;
String tmp = "";
for (int i = 1; i < sub.length - 1; i++)
{
last = getLast(mSplit, tag);
sub[i] = "<" + tag + sub[i] + tag + ">" + last;
}
sub[sub.length - 1] = "<" + tag + sub[sub.length - 1];
这一点昨天已经想到了,这两天会抽个时间再改善一下。需要改一下做法,改善之后应该能满足你的要求。呵呵,除非是你又加了新的需求
另外抱歉的一点是,我在公司里面没法用聊天软件
public static void main(String[] args) {
String str = "<a b=\"1\"> <a c=\"1\">ddd </a> </a>";//<b="1"> <c="1">ddd </c> </b>
str = str.replaceAll("(<)((a) (b))(=\"1\"> <)((a) (c))(=\"1\">ddd </)(a)(> </)(a)(>)","$1$4$5$8$9$8$11$4$13");
System.out.println(str);
}
这样处理比较灵活,
语法中,每个"("为一个结点,java中称为组,索引从1开始,$1表示为第一组,
比如:(s)(a(bcd)) (s)为第一组,)(a(bcd))为第二组,(bcd)为第三组,就是说见到的第一个"("为一个结点
按照楼主后来给的例子,始终感觉这是解析html或者xml的tag。如果是这样的话,那么难点就在于如何用tag来拆分这个字符串。后来我又仔细的考虑了一下,如果单纯的用正则表达式的话,以我目前所掌握的程度没有办法完美的拆分这个字符串。因为可以说所有的字符都可以出现在html或者xml中,对于一个tag元素来说,我就没有办法确定哪些字符不会在这个tag中出现,也就没有办法界定这个tag元素的两个边界。就像这样 >[^<|>]*< 中的 > 和 < 是两个边界,而 [^<|>]* 取得是他们中间的字符。当然如果楼主能给出一些限制的话,我想还是会有办法用正则表达式来拆分这个字符串的。
另外想到的一点是,用xml解析也许是一个好办法。但是到目前为止我没有找到Java里面直接读取xml格式字符串转化成XmlDocument的函数(C#里面有),只能用数据流来操作了。亦或者用最原始的方法,将这个字符串从头到尾走一边,判断每一个字符然后拆分成一个包含所有子元素的数组。就像四则运算里面拆分括号里面的内容一样。基本思路就是这样,但是不管哪一种方法处理起来都会比较复杂。最近可能没有时间去实现这些代码了
正确做法式, 细分, 细分到不超过长度为100的String 类型.2. 尽量其它方法将匹配的目标缩小, 如xml解析,以及其它文本解析的方法, String提供的基本API
String[] strs = new String[] {
"<a b=\"1\"> <a c=\"1\">ddd </a> </a>",
"<abc bcd=\"1\"> <abc cde=\"1\">ddd </abc> </abc>",
"<a b=\\\"1\\\"> <a c=\\\"1\\\">a </a> </a>"
}; for (String str : strs) {
System.out.println("before: " + str);
str = str.replaceAll("^(<\\w+ +)((\\w+)=[^>=]+> )(<\\w+ +)((\\w+)=[^>=]+>[^<]+)(<\\/\\w+>)([^<]+)(<\\/\\w+>)",
"<$2<$5</$3>$8</$6>");
System.out.println("after: " + str);
}结果如下:before: <a b="1"> <a c="1">ddd </a> </a>
after: <b="1"> <c="1">ddd </b> </c>
before: <abc bcd="1"> <abc cde="1">ddd </abc> </abc>
after: <bcd="1"> <cde="1">ddd </bcd> </cde>
before: <a b=\"1\"> <a c=\"1\">a </a> </a>
after: <b=\"1\"> <c=\"1\">a </b> </c>
tlowl 高人 膜拜中...