c#里如何快速提取html里的文字?

不知道如何转换,上网搜了一下,有个用正则表达式处理的,函数名 StripHTML,速度暴慢不说效果还不好,还不如下面两句来的效果快而明显!
string temp = Regex.Replace(strHtml, "<[^>]*>", "");
return temp.Replace(" ", " ");我想知道,难度微软.net开发包里没有包含这样的功能吗?

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

.NET doesn't provide a library for html parsing, here are a few ways you can try1. use regular expressions2. use sgmlreader and manipulate your html in xml format
http://www.eggheadcafe.com/articles/20030317.asp3. use WebBrowser control4. use the mshtml
http://www.codeguru.com/vb/vb_internet/html/article.php/c4815/
also seeMajestic-12 : Projects : HTML parser (C# .NET)
http://www.majestic12.co.uk/projects/html_parser.php
楼主的那两段代码不行吗？regex 已经算快的了. 如果还要快, 只能自己写写底层, 例如按 char 来扫描一遍源字符串, 同时另外构造一个 StringBuilder 类, 把需要保留的字符给压入 StringBuilder.
微软现成的 .NET 类库中就有很多体现高性能的样本, 用 reflector 之类的反编译工具看看 MS 的 HtmlEncode 等字符串处理函数是怎么工作的, 很有参考意义.
用了系统自带的query.dll,功能很强大,还能解析word等office文档.