如何用正则表达式去掉HTML标签

我现在做一个网站，新闻文字内容从数据库中content列（自己建的）读出，但content中有诸如<p></p><span></span>等html标签，而新闻字数又是只能显示前面一部分的，需要截取，只有替换掉这些标签截取的才是字符，不然例如截取到地20个字符恰好是"<"的话，显示出来就不是文字了，要在后台把这些符号替换掉，怎么办呀？大侠们……

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

(?<=\<.+\>)[^\<]+(?=\</.+\>)取出标签中的内容，试试
string str=System.Text.RegularExpressions.Regex.Replace("", @"<[^>]*>", "");
     Htmlstring = Regex.Replace(Htmlstring, @" <script[^>]*?>.*? </script>", "", RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @" <(.[^>]*)>", "", RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @"([\r\n])[\s]+", "", RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @"-->", "", RegexOptions.IgnoreCase);
/// <Header> /// 去除 HTML tag
        /// </Header>
        /// <param name="HTML">源</param>
        /// <returns>结果</returns>
        public static string StripHTML(string HTML) //google "StripHTML" 得到
{ string[] Regexs =
                                {
                                    @"<script[^>]*?>.*?</script>",
                                    @"<(\/\s*)?!?((\w+:)?\w+)(\w+(\s*=?\s*(([""'])(\\[""'tbnr]|[^\7])*?\7|\w+)|.{0})|\s)*?(\/\s*)?>",
                                    @"([\r\n])[\s]+",
                                    @"&(quot|#34);",
                                    @"&(amp|#38);",
                                    @"&(lt|#60);",
                                    @"&(gt|#62);",
                                    @"&(nbsp|#160);",
                                    @"&(iexcl|#161);",
                                    @"&(cent|#162);",
                                    @"&(pound|#163);",
                                    @"&(copy|#169);",
                                    @"&#(\d+);",
                                    @"-->",
                                    @"<!--.*\n"
                                };            string[] Replaces =
                                {
                                    "",
                                    "",
                                    "",
                                    "\"",
                                    "&",
                                    "<",
                                    ">",
                                    " ",
                                    "\xa1", //chr(161),
                                    "\xa2", //chr(162),
                                    "\xa3", //chr(163),
                                    "\xa9", //chr(169),
                                    "",
                                    "\r\n",
                                    ""
                                };            string s = HTML;
            for (int i = 0; i < Regexs.Length; i++)
            {
                s = new Regex(Regexs[i], RegexOptions.Multiline | RegexOptions.IgnoreCase).Replace(s, Replaces[i]);
            }
            s.Replace("<", "");
            s.Replace(">", "");
            s.Replace("\r\n", "");
            return s;
        }
    }
http://hi.baidu.com/linsen309/blog/item/0ec5eb241cbc55348644f9ee.html
是了wxg22526451的方法，wxg22526451 v5
string regexstr = @"<[^>]*>";
        ,     context = Regex.Replace(contextregexstr, string.Empty, RegexOptions.IgnoreCase);
            context = context.Replace(" ", "");
这个方法很好，我也遇到类似的问题，采用了wxg22526451的方法解决了。