如何将采集的HTML源码过滤所有HTML标签。得到类似全选复制HTML再粘贴到记事本中的文本效果。

如何将采集的HTML源码过滤所有HTML标签。得到类似全选复制HTML再粘贴到记事本中的文本效果。再粘贴到记事本中的文本里面会有一些空格。
最好是jsp或者c#源码

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

public static String html2Text(String inputString) {
  String htmlStr = inputString; //含html标签的字符串
      String textStr ="";
      java.util.regex.Pattern p_annotate;
      java.util.regex.Matcher m_annotate;
      java.util.regex.Pattern p_script;
      java.util.regex.Matcher m_script;
      java.util.regex.Pattern p_style;
      java.util.regex.Matcher m_style;
      java.util.regex.Pattern p_html;
      java.util.regex.Matcher m_html;       try {
       String regEx_annotate = "";//定义注释的正则表达式
          String regEx_script = "<[\\s]*?script[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?script[\\s]*?>"; //定义script的正则表达式
          String regEx_style = "<[\\s]*?style[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?style[\\s]*?>"; //定义style的正则表达式
          String regEx_html = "<[^>]+>"; //定义HTML标签的正则表达式           p_annotate = Pattern.compile(regEx_annotate,Pattern.CASE_INSENSITIVE);
          m_annotate = p_annotate.matcher(htmlStr);
          htmlStr = m_annotate.replaceAll(""); //过滤注释标签

          p_script = Pattern.compile(regEx_script,Pattern.CASE_INSENSITIVE);
          m_script = p_script.matcher(htmlStr);
          htmlStr = m_script.replaceAll(""); //过滤script标签           p_style = Pattern.compile(regEx_style,Pattern.CASE_INSENSITIVE);
          m_style = p_style.matcher(htmlStr);
          htmlStr = m_style.replaceAll(""); //过滤style标签           p_html = Pattern.compile(regEx_html,Pattern.CASE_INSENSITIVE);
          m_html = p_html.matcher(htmlStr);
          htmlStr = m_html.replaceAll(""); //过滤html标签           textStr = htmlStr;
      }catch(Exception e) {
          System.err.println("Html2Text: " + e.getMessage());
      }       return textStr;//返回文本字符串
    }
先过滤 的正则表达式
然后过滤script 和style </style>  head 包含字符。最后过滤剩下的所有的<>包含内容。

如何将采集的HTML源码 过滤所有HTML标签。得到类似全选复制HTML再粘贴到记事本中的文本效果。

解决方案 »

如何将采集的HTML源码过滤所有HTML标签。得到类似全选复制HTML再粘贴到记事本中的文本效果。