楼主的正则表达式是没有问题的,只是不知道a.getstr是什么函数,故没法测试,下面根据楼主的正则表达式实现了楼主的效果package learning;import java.util.regex.Matcher; import java.util.regex.Pattern;public class RegTest {
public static void main(String[] args){ String pattern = "<quote([\\s\\S]*?)</quote>"; String str = "3333<quote> But he will not object if others use the tikka masala name.</quote>" +"<quote><img src=\"http://forum.csdn.net/PointForum/ui/scripts/csdn/Plugin/003/monkey/5.gif\" alt=\"\">" +"\"When we invented this dish, we never thought it would be that popular, and now" +"it's the most popular dish in Great Britain.</quote>" +"<quote>\"We don't mind if other people use it. Everyone should enjoy that.\"" +"</quote>5555"; System.out.println(str.length() + ": " + str); // 创建 Pattern 对象 Pattern r = Pattern.compile(pattern); // 现在创建 matcher 对象 Matcher m = r.matcher(str); int j = 0; int startIndex = 0; int lastIndex = 0; while (m.find()) { for (int i=0; i< m.groupCount();i++){ String value = m.group(i); System.out.println(value); if(j == 0) startIndex = m.start() + ("<quote>".length()); System.out.println("Found value: " +m.group(i)+" " + m.end() + m.group(i).length() ); //String sub = "*"; lastIndex = m.end(); //str=str.replace(value, str.getstr(m.group(i).length(),sub)); //System.out.println(str); } j++; } lastIndex = lastIndex -("</quote>".length()); String subStr = str.substring(startIndex, lastIndex); System.out.println(subStr); System.out.println(); StringBuffer buffer = new StringBuffer(); for(int i = 0; i < subStr.length(); i++){ buffer.append("*"); } str = str.replace(subStr, buffer.toString()); System.out.println(str.length() + ": " + str); } }
谢谢大家帮忙,本来想用一个简单的方式把问题说清楚,无奈存在很多误解,下面我补充一下项目的具体需求。 我是要做一个实体识别的工作,识别出来的实体位置是要确定的。 但是要求就是不识别<quote></quote>嵌套里面的字符,所以我想把嵌套里面的字符替换成等数量的星号(*),所以a.getstr 就实现了我替换多少个星号的功能。 最后我的问题就是一层说的嵌套问题了,贴一个完整的需要处理的文件: </post> <post author="Emma" datetime="2009-08-24T17:26:00" id="p10"> <quote orig_author="Article 15"> <quote> LOS ANGELES A law enforcement official tells The Associated Press that the Los Angeles County coroner has ruled Michael Jackson's death a homicide.The finding makes it more likely criminal charges will be filed against the doctor who was with the pop star when he died.</quote><a href="http://news.yahoo.com/s/ap/20090824/ap_en_ot/us_michael_jackson_investigation;_ylt=AuR4__jpJgWVbb65Dgl9Btis0NUE;_ylu=X3oDMTNlMmpobnVzBGFzc2V0A2FwLzIwMDkwODI0L3VzX21pY2hhZWxfamFja3Nvbl9pbnZlc3RpZ2F0aW9uBGNwb3MDMgRwb3MDNgRwdANob21lX2Nva2UEc2VjA3luX3RvcF9zdG9yeQRzbGsDYXBzb3VyY2Vjb3Jv">AP Source: Coroner rules Jackson's death homicide - Yahoo! News</a></quote><quote> A designation of homicide means that Jackson died at the hands of another, but does not necessarily mean a crime was committed.</quote><img src="http://www.usmessageboard.com/images/smilies/eusa_eh.gif"/> </post>
"We don't mind if other people use it. Everyone should enjoy that."
你把这个标签去了试试
试试这个呢
去了之后还是匹配的第一个</quote>,谢谢回复
import java.util.regex.Pattern;public class RegTest {
public static void main(String[] args){
String pattern = "<quote([\\s\\S]*?)</quote>";
String str = "3333<quote> But he will not object if others use the tikka masala name.</quote>"
+"<quote><img src=\"http://forum.csdn.net/PointForum/ui/scripts/csdn/Plugin/003/monkey/5.gif\" alt=\"\">"
+"\"When we invented this dish, we never thought it would be that popular, and now"
+"it's the most popular dish in Great Britain.</quote>"
+"<quote>\"We don't mind if other people use it. Everyone should enjoy that.\""
+"</quote>5555";
System.out.println(str.length() + ": " + str);
// 创建 Pattern 对象
Pattern r = Pattern.compile(pattern);
// 现在创建 matcher 对象
Matcher m = r.matcher(str);
int j = 0;
int startIndex = 0;
int lastIndex = 0;
while (m.find()) {
for (int i=0; i< m.groupCount();i++){
String value = m.group(i);
System.out.println(value);
if(j == 0)
startIndex = m.start() + ("<quote>".length());
System.out.println("Found value: " +m.group(i)+" " + m.end() + m.group(i).length() );
//String sub = "*";
lastIndex = m.end();
//str=str.replace(value, str.getstr(m.group(i).length(),sub));
//System.out.println(str);
}
j++;
}
lastIndex = lastIndex -("</quote>".length());
String subStr = str.substring(startIndex, lastIndex);
System.out.println(subStr);
System.out.println();
StringBuffer buffer = new StringBuffer();
for(int i = 0; i < subStr.length(); i++){
buffer.append("*");
}
str = str.replace(subStr, buffer.toString());
System.out.println(str.length() + ": " + str);
}
}
我是要做一个实体识别的工作,识别出来的实体位置是要确定的。
但是要求就是不识别<quote></quote>嵌套里面的字符,所以我想把嵌套里面的字符替换成等数量的星号(*),所以a.getstr 就实现了我替换多少个星号的功能。
最后我的问题就是一层说的嵌套问题了,贴一个完整的需要处理的文件:
</post>
<post author="Emma" datetime="2009-08-24T17:26:00" id="p10">
<quote orig_author="Article 15">
<quote>
LOS ANGELES A law enforcement official tells The Associated Press that the Los Angeles County coroner has ruled Michael Jackson's death a homicide.The finding makes it more likely criminal charges will be filed against the doctor who was with the pop star when he died.</quote><a href="http://news.yahoo.com/s/ap/20090824/ap_en_ot/us_michael_jackson_investigation;_ylt=AuR4__jpJgWVbb65Dgl9Btis0NUE;_ylu=X3oDMTNlMmpobnVzBGFzc2V0A2FwLzIwMDkwODI0L3VzX21pY2hhZWxfamFja3Nvbl9pbnZlc3RpZ2F0aW9uBGNwb3MDMgRwb3MDNgRwdANob21lX2Nva2UEc2VjA3luX3RvcF9zdG9yeQRzbGsDYXBzb3VyY2Vjb3Jv">AP Source: Coroner rules Jackson's death homicide - Yahoo! News</a></quote><quote>
A designation of homicide means that Jackson died at the hands of another, but does not necessarily mean a crime was committed.</quote><img src="http://www.usmessageboard.com/images/smilies/eusa_eh.gif"/>
</post>
也不知道怎么编辑改掉。
各位见谅!
不好意思回复晚了!
我试了一下您的代码,您实现了我的替换功能,就是把匹配到的替换成*,但是关于匹配那个的结果,依然不是对应着的。。我的意思是,第一个<quote>还是没有匹配上最后那个</quote>,它的结果还是原来我的结果。
谢谢您!这应该是我正则的问题。
不好意思回复晚了!
我试了一下您的代码,您实现了我的替换功能,就是把匹配到的替换成*,但是关于匹配那个的结果,依然不是对应着的。。我的意思是,第一个<quote>还是没有匹配上最后那个</quote>,它的结果还是原来我的结果。
谢谢您!这应该是我正则的问题。我试的结果是可以匹配的啊,并且是对应的啊
参考:
http://www.imkevinyang.com/2010/07/javajs%E5%A6%82%E4%BD%95%E4%BD%BF%E7%94%A8%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F%E5%8C%B9%E9%85%8D%E5%B5%8C%E5%A5%97html%E6%A0%87%E7%AD%BE.html
我一个学长的方法完美解决了问题,具体用到我这里,修改为:
String pattern ="<quote[\\s\\S]*?>(<quote[\\s\\S]*?>[\\s\\S]*?</quote>|[\\s\\S]*?)*?</quote>";
此为一层迭代,谢谢楼上诸位。