怎样消除fck 的结果中的<div>....</div> html标签 。
要求:去掉后,输出到word后,样式基本不变,最基本的换行总有吧。只是去掉标签,已经实现【用正则表达式】,但是输出到word后格式却都是内容紧贴着,没有换行以及格式了

解决方案 »

  1.   

    替换HTML
    Regex.Replace(str,@"<[^> ]+>",""); 
    换行使用Enviroment.NewLine,\r\n替换标识
    保存为HTML
      

  2.   

    public static string StripHTML(string HTML) //google "StripHTML" 得到
            {
                string[] Regexs = {
                                      @"<script[^>]*?>.*?</script>",
                                      @"<(\/\s*)?!?((\w+:)?\w+)(\w+(\s*=?\s*(([""'])(\\[""'tbnr]|[^\7])*?\7|\w+)|.{0})|\s)*?(\/\s*)?>",
                                      @"([\r\n])[\s]+",
                                      @"&(quot|#34);",
                                      @"&(amp|#38);",
                                      @"&(lt|#60);",
                                      @"&(gt|#62);",
                                      @"&(nbsp|#160);",
                                      @"&(iexcl|#161);",
                                      @"&(cent|#162);",
                                      @"&(pound|#163);",
                                      @"&(copy|#169);",
                                      @"&#(\d+);",
                                      @"-->",
                                      @"<!--.*\n"
                                   };            string[] Replaces ={
                                          "",
                                          "",
                                          "",
                                          "\"",
                                          "&",
                                          "<",
                                          ">",
                                          " ",
                                          "\xa1", //chr(161),
                                          "\xa2", //chr(162),
                                          "\xa3", //chr(163),
                                          "\xa9", //chr(169),
                                          "",
                                          "\r\n",
                                          ""
                                      };
                string s = HTML;
                for (int i = 0; i < Regexs.Length; i++)
                {
                    s = new Regex(Regexs[i], RegexOptions.Multiline | RegexOptions.IgnoreCase).Replace(s, Replaces[i]);
                }
                s.Replace("<", "");
                s.Replace(">", "");
                s.Replace("\r\n", "");
                return s;
            } 
    这是消除html标签的。但是消除后,内容全部连一块了。
      

  3.   


    String xx=Regex.Replace(源字符,@"<div[^>]*>([\s\S]*?)</div>","<p>$1</p>");