正则式<.*?>为什么不能匹配HTML的注释标签？

一个老话题：http://topic.csdn.net/t/20050221/08/3794052.html 讲到过解决方案。
但是我想问为什，.*不通配!-- --吗?

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

String str = "";
    str = str.replaceAll("(?s)<(.*?)>","$1");
    System.out.println(str);输出为
!-- 121212--证明可以匹配的。不知道你遇到的是啥问题。
不好意思，我之前用<.*>来意图将我的一个网页拆分出文本出来,网页如下:<%@ page language="java" import="java.util.*" pageEncoding="gb2312"%>
<%
String path = request.getContextPath();
String basePath = request.getScheme()+"://"+request.getServerName()+":"+request.getServerPort()+path+"/";
%><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <base href="<%=basePath%>">

    <title>My JSP 'index.jsp' starting page</title>
<meta http-equiv="pragma" content="no-cache">
<meta http-equiv="cache-control" content="no-cache">
<meta http-equiv="expires" content="0">
<meta http-equiv="keywords" content="keyword1,keyword2,keyword3">
<meta http-equiv="description" content="This is my page">

  </head>

  <body>
   <input type="hidden" value="7">
    This is my JSP page.  哈哈<br>
    for Index,index.<br>
    <%int currentPage;
     if (request.getParameter("currentPage") == null)
currentPage = 1;
else
currentPage = Integer.valueOf(request.getParameter("currentPage"));
     %>
    <a href="index.jsp?currentPage=<%=currentPage+1%>"><%=currentPage%></a>
  </body>
</html>
public class Experiment {
public static void main(String args[]) throws CorruptIndexException,IOException{
try {
HTTPSocket http = new HTTPSocket();
http.send(args[0], null);
System.out.println(http.getBody());
String output = getText(http.getBody());
System.out.println(output);
} catch (Exception e) {

}
}

public static String getText(String original)
{
String regex = "<.*>";
                 String output = original.replaceAll(regex,"");
                 return output;
}

}
然后分析出来来的结果是这样的：






    This is my JSP page.  哈哈
    for Index,index.


  没有被清除，我怀疑是因为换行符的缘故 \r\n
//这样看看
Pattern p = Pattern.compile("<.*>",Pattern.DOTALL);
        Matcher m = p.matcher(original);
        m.replaceAll("");
我记得你在我问的另外一个贴子里说过，DOTALL可以匹配line terminaters ，（呵呵谢谢），不过这个我也试验过代码如下:public class Experiment {
public static void main(String args[]) throws CorruptIndexException,IOException{
try {
HTTPSocket http = new HTTPSocket();
http.send(args[0], null);
System.out.println(http.getBody());
String output = getText(http.getBody());
System.out.println(output);
System.out.println("end");
} catch (Exception e) {

}
}

public static String getText(String original)
{
Pattern p = Pattern.compile("<.*?>",Pattern.DOTALL);
Matcher m = p.matcher(original);
StringBuffer sb = new StringBuffer();
while (m.find())
{
m.appendReplacement(sb,"");
}
m.appendTail(sb);
return sb.toString();
}

}
实验的数据还是之前我写的那段简单HTML，结果是这样的：



    My JSP 'index.jsp' starting page



-->




    This is my JSP page.  哈哈
    for Index,index.

    1

end
还是没能去掉注释，或者说去掉了一半，另外不能用<.*>来匹配，因为<html……/html>贪婪匹配，把所有字符都去掉了。
但如果用DOTALL就可以匹配换行符了啊