C#获取网页中匹配代码的正则问题

如：
<div>1div</div>
<a>1a</a>
<p>1p</p>
<p>2p</p>
<div>2div</div>
<a>2a</a>
<p>3p</p>
<p>4p</p>
<a>3a</a>
<p>5p</p>
<div>3div</div>
<a>4a</a>
<p>6p</p>
<span>1span</span>现在的问题是：有N多DIV，N多p,N多A标签以及最多1个span，想只获取所有p里的内容以及最后一个span里的内容（其中获取P的内容有一个条件，那就是只有前面有一个A标签的P的内容才会被获取），span或许有或许没有，如果有就获取，如果没有就不获取
求：C#的正则表达式，谢谢

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

(?is)(?<=<a[^>]*>)<p[^>]*>.*</p>
foreach(Match m in Regex.Matches(yourstr,@"(?ins)(?<=(</a>\s*<(?<>p[^>]*>)|<(?<>span)[^>]*>))[\s\S]+?(?=</\k<>)")
{
    m.Value;//就是你要的结果
}
结果1p
3p
5p
6p
1span
或是用
foreach(Match m in Regex.Matches(yourHtml,@"(?is)(</a>\s*<(?<>p[^>]*>)|<(?<>span)[^>]*>)(?<data>[\s\S]+?)</\k<>"))
{
    m.Groups["data"].Value;//
}
        #region 获得字符串中开始和结束字符串中间得值
        /// <summary>
        /// 获得字符串中开始和结束字符串中间得值
        /// </summary>
        /// <param name="begin">开始匹配标记</param>
        /// <param name="end">结束匹配标记</param>
        /// <param name="html">Html字符串</param>
        /// <returns>返回中间字符串</returns>
        public static MatchCollection GetMidValue(string begin, string end, string html)
        {
            Regex reg = new Regex("(?<=(" + begin + "))[.\\s\\S]*?(?=(" + end + "))", RegexOptions.Multiline | RegexOptions.Singleline);
            return reg.Matches(html);
        }
        #endregion
我自己用的非常好用过虑的非常刚净
public static string NoHTML(string Htmlstring)
        {
            //删除脚本
            Htmlstring = Regex.Replace(Htmlstring, @"<script[^>]*?>.*?</script>", "",
                RegexOptions.IgnoreCase);
            //删除HTML
            Htmlstring = Regex.Replace(Htmlstring, @"<(.[^>]*)>", "",
                RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @"([\r\n])[\s]+", "",
                RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @"-->", "", RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @"<!--.*", "", RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @"&(quot|#34);", "\"",
                RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @"&(amp|#38);", "&",
                RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @"&(lt|#60);", "<",
                RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @"&(gt|#62);", ">",
                RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @"&(nbsp|#160);", "   ",
                RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @"&(iexcl|#161);", "\xa1",
                RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @"&(cent|#162);", "\xa2",
                RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @"&(pound|#163);", "\xa3",
                RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @"&(copy|#169);", "\xa9",
                RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @"&#(\d+);", "",
                RegexOptions.IgnoreCase);
            Htmlstring.Replace("<", "");
            Htmlstring.Replace(">", "");
            Htmlstring.Replace("\r\n", "");
            Htmlstring =System.Web.HttpContext.Current.Server.HtmlEncode(Htmlstring).Trim();            return Htmlstring;
        }本文来至E学院原文地址：http://www.ip8000.com/Programming/NET/Csharp/201010/65071.html