正则式<.*?>为什么不能匹配HTML的注释标签? 一个老话题:http://topic.csdn.net/t/20050221/08/3794052.html 讲到过解决方案。但是我想问为什,.*不通配!-- --吗? 解决方案 » 免费领取超大流量手机卡,每月29元包185G流量+100分钟通话, 中国电信官方发货 String str = "<!-- 121212-->"; str = str.replaceAll("(?s)<(.*?)>","$1"); System.out.println(str);输出为!-- 121212--证明可以匹配的。不知道你遇到的是啥问题。 不好意思,我之前用<.*>来意图将我的一个网页拆分出文本出来,网页如下:<%@ page language="java" import="java.util.*" pageEncoding="gb2312"%><%String path = request.getContextPath();String basePath = request.getScheme()+"://"+request.getServerName()+":"+request.getServerPort()+path+"/";%><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html> <head> <base href="<%=basePath%>"> <title>My JSP 'index.jsp' starting page</title> <meta http-equiv="pragma" content="no-cache"> <meta http-equiv="cache-control" content="no-cache"> <meta http-equiv="expires" content="0"> <meta http-equiv="keywords" content="keyword1,keyword2,keyword3"> <meta http-equiv="description" content="This is my page"> <!-- <link rel="stylesheet" type="text/css" href="styles.css"> --> </head> <body> <input type="hidden" value="7"> This is my JSP page. 哈哈<br> for Index,index.<br> <%int currentPage; if (request.getParameter("currentPage") == null) currentPage = 1; else currentPage = Integer.valueOf(request.getParameter("currentPage")); %> <a href="index.jsp?currentPage=<%=currentPage+1%>"><%=currentPage%></a> </body></html>public class Experiment { public static void main(String args[]) throws CorruptIndexException,IOException{ try { HTTPSocket http = new HTTPSocket(); http.send(args[0], null); System.out.println(http.getBody()); String output = getText(http.getBody()); System.out.println(output); } catch (Exception e) { } } public static String getText(String original) { String regex = "<.*>"; String output = original.replaceAll(regex,""); return output; } }然后分析出来来的结果是这样的: <!-- --> This is my JSP page. 哈哈 for Index,index. <!-- -->没有被清除,我怀疑是因为换行符的缘故 \r\n //这样看看 Pattern p = Pattern.compile("<.*>",Pattern.DOTALL); Matcher m = p.matcher(original); m.replaceAll(""); 我记得你在我问的另外一个贴子里说过,DOTALL可以匹配line terminaters ,(呵呵 谢谢),不过这个我也试验过代码如下:public class Experiment { public static void main(String args[]) throws CorruptIndexException,IOException{ try { HTTPSocket http = new HTTPSocket(); http.send(args[0], null); System.out.println(http.getBody()); String output = getText(http.getBody()); System.out.println(output); System.out.println("end"); } catch (Exception e) { } } public static String getText(String original) { Pattern p = Pattern.compile("<.*?>",Pattern.DOTALL); Matcher m = p.matcher(original); StringBuffer sb = new StringBuffer(); while (m.find()) { m.appendReplacement(sb,""); } m.appendTail(sb); return sb.toString(); } }实验的数据还是之前我写的那段简单HTML,结果是这样的: My JSP 'index.jsp' starting page --> This is my JSP page. 哈哈 for Index,index. 1 end还是没能去掉注释,或者说去掉了一半,另外不能用<.*>来匹配,因为<html……/html>贪婪匹配,把所有字符都去掉了。 但如果用DOTALL就可以匹配换行符了啊 JAVA五子棋网络版 需要传递棋子的 坐标信息 怎么传递到客户端哪 传递到客户端怎么提取出来使用 Struts2下的Ajax异步验证问题 Spring beans-config.xml 可否封装在jar 内? j2ee web 常用的开发技术有那些? 关于java多线程阻塞的问题? byte如何转成int? 请问一个何时用implements,何时用extends问题 如何获得一个panel上所有组件的text? 请问哪里有java用户文档下载? websphere的配置问题。 java实现C类似链表 请教一个正则表达式的错误
str = str.replaceAll("(?s)<(.*?)>","$1");
System.out.println(str);输出为
!-- 121212--证明可以匹配的。不知道你遇到的是啥问题。
<%
String path = request.getContextPath();
String basePath = request.getScheme()+"://"+request.getServerName()+":"+request.getServerPort()+path+"/";
%><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<base href="<%=basePath%>">
<title>My JSP 'index.jsp' starting page</title>
<meta http-equiv="pragma" content="no-cache">
<meta http-equiv="cache-control" content="no-cache">
<meta http-equiv="expires" content="0">
<meta http-equiv="keywords" content="keyword1,keyword2,keyword3">
<meta http-equiv="description" content="This is my page">
<!--
<link rel="stylesheet" type="text/css" href="styles.css">
-->
</head>
<body>
<input type="hidden" value="7">
This is my JSP page. 哈哈<br>
for Index,index.<br>
<%int currentPage;
if (request.getParameter("currentPage") == null)
currentPage = 1;
else
currentPage = Integer.valueOf(request.getParameter("currentPage"));
%>
<a href="index.jsp?currentPage=<%=currentPage+1%>"><%=currentPage%></a>
</body>
</html>
public class Experiment {
public static void main(String args[]) throws CorruptIndexException,IOException{
try {
HTTPSocket http = new HTTPSocket();
http.send(args[0], null);
System.out.println(http.getBody());
String output = getText(http.getBody());
System.out.println(output);
} catch (Exception e) {
}
}
public static String getText(String original)
{
String regex = "<.*>";
String output = original.replaceAll(regex,"");
return output;
}
}
然后分析出来来的结果是这样的:
<!--
-->
This is my JSP page. 哈哈
for Index,index.
<!-- -->没有被清除,我怀疑是因为换行符的缘故 \r\n
//这样看看
Pattern p = Pattern.compile("<.*>",Pattern.DOTALL);
Matcher m = p.matcher(original);
m.replaceAll("");
public static void main(String args[]) throws CorruptIndexException,IOException{
try {
HTTPSocket http = new HTTPSocket();
http.send(args[0], null);
System.out.println(http.getBody());
String output = getText(http.getBody());
System.out.println(output);
System.out.println("end");
} catch (Exception e) {
}
}
public static String getText(String original)
{
Pattern p = Pattern.compile("<.*?>",Pattern.DOTALL);
Matcher m = p.matcher(original);
StringBuffer sb = new StringBuffer();
while (m.find())
{
m.appendReplacement(sb,"");
}
m.appendTail(sb);
return sb.toString();
}
}
实验的数据还是之前我写的那段简单HTML,结果是这样的:
My JSP 'index.jsp' starting page
-->
This is my JSP page. 哈哈
for Index,index.
1
end
还是没能去掉注释,或者说去掉了一半,另外不能用<.*>来匹配,因为<html……/html>贪婪匹配,把所有字符都去掉了。