C#如何将网页文件转换为纯文本文件保存，并且过滤掉网页代码的内容，只留下文章的内容。

string text="网页内容";
MatchCollection mc = Regex.Matches(text, @"(?<=<(?!a)[^>]*>|^)(?!\s+<)[^<>]+(?=<(?!/a)[^>]*>|$)", RegexOptions.IgnoreCase);
string result;
foreach (Match m in mc)
{
    result+=m.Value;
}
这个正则应该能满足LZ的要求。。昨晚看到root写的这个正则，收藏了。。
            repcontent = Regex.Replace(repcontent,"<script([\\s\\S]+?)</script>","",RegexOptions.IgnoreCase);
            repcontent = Regex.Replace(repcontent,"<style([\\s\\S]+?)</style>","",RegexOptions.IgnoreCase);
            repcontent = Regex.Replace(repcontent, "<(.|\n)+?>", "", RegexOptions.IgnoreCase);
            repcontent = Regex.Replace(repcontent,"(\\s+?)","");
            repcontent = repcontent.Replace(" ", "");我是用这种方法来过滤的
只保留了，中英文标点
像public</span> <span style="color:blue">static</span> <span style="color:blue">class</span>
本来网页上是：public static class但是处理后成了：
public
static
class
肯定是你自己加了换行符了。。
foreach (Match m in mc)
{
    result+=m.Value; //这里不要加换行符，要加也是加空格啊。。
}