我现在做一个网站,新闻文字内容从数据库中content列(自己建的)读出,但content中有诸如<p></p><span></span>等html标签,而新闻字数又是只能显示前面一部分的,需要截取,只有替换掉这些标签截取的才是字符,不然例如截取到地20个字符恰好是"<"的话,显示出来就不是文字了,要在后台把这些符号替换掉,怎么办呀?大侠们……

解决方案 »

  1.   

    (?<=\<.+\>)[^\<]+(?=\</.+\>)取出标签中的内容,试试
      

  2.   

    string str=System.Text.RegularExpressions.Regex.Replace("", @"<[^>]*>", "");
         Htmlstring = Regex.Replace(Htmlstring, @" <script[^>]*?>.*? </script>", "", RegexOptions.IgnoreCase); 
                Htmlstring = Regex.Replace(Htmlstring, @" <(.[^>]*)>", "", RegexOptions.IgnoreCase); 
                Htmlstring = Regex.Replace(Htmlstring, @"([\r\n])[\s]+", "", RegexOptions.IgnoreCase); 
                Htmlstring = Regex.Replace(Htmlstring, @"-->", "", RegexOptions.IgnoreCase); 
      

  3.   

    /// <Header> /// 去除 HTML tag
            /// </Header>
            /// <param name="HTML">源</param>
            /// <returns>结果</returns>
            public static string StripHTML(string HTML) //google "StripHTML" 得到
    { string[] Regexs =
                                    {
                                        @"<script[^>]*?>.*?</script>",
                                        @"<(\/\s*)?!?((\w+:)?\w+)(\w+(\s*=?\s*(([""'])(\\[""'tbnr]|[^\7])*?\7|\w+)|.{0})|\s)*?(\/\s*)?>",
                                        @"([\r\n])[\s]+",
                                        @"&(quot|#34);",
                                        @"&(amp|#38);",
                                        @"&(lt|#60);",
                                        @"&(gt|#62);",
                                        @"&(nbsp|#160);",
                                        @"&(iexcl|#161);",
                                        @"&(cent|#162);",
                                        @"&(pound|#163);",
                                        @"&(copy|#169);",
                                        @"&#(\d+);",
                                        @"-->",
                                        @"<!--.*\n"
                                    };            string[] Replaces =
                                    {
                                        "",
                                        "",
                                        "",
                                        "\"",
                                        "&",
                                        "<",
                                        ">",
                                        " ",
                                        "\xa1", //chr(161),
                                        "\xa2", //chr(162),
                                        "\xa3", //chr(163),
                                        "\xa9", //chr(169),
                                        "",
                                        "\r\n",
                                        ""
                                    };            string s = HTML;
                for (int i = 0; i < Regexs.Length; i++)
                {
                    s = new Regex(Regexs[i], RegexOptions.Multiline | RegexOptions.IgnoreCase).Replace(s, Replaces[i]);
                }
                s.Replace("<", "");
                s.Replace(">", "");
                s.Replace("\r\n", "");
                return s;
            }
        } 
    http://hi.baidu.com/linsen309/blog/item/0ec5eb241cbc55348644f9ee.html
      

  4.   

    是了wxg22526451的方法,wxg22526451 v5
      

  5.   

    string regexstr = @"<[^>]*>";
            ,     context = Regex.Replace(contextregexstr, string.Empty, RegexOptions.IgnoreCase);
                context = context.Replace("&nbsp;", "");
      

  6.   

    这个方法很好,我也遇到类似的问题,采用了wxg22526451的方法解决了。