截取字符串问题

假设有段代码，如下
<div class="basic" style="float:left;" id="list1a">
<a>There is one obvious advantage:</a>
<div>
<p>
You've seen it coming!<br/>
Buy now and get nothing for free!<br/>
Well, at least no free beer. Perhaps a bear,<br/>
if you can afford it.
</p>
</div>
<a>Now that you've got...</a>
<div>
<p>
your bear, you have to admit it!<br/>
No, we aren't selling bears.
</p>
</div>
<a>Rent one bear, ...</a>
<div>
<p>
get two for three beer.
</p>
<p>
<a>And now, for something completely different.</a>
<a>And now, for something completely different.</a>
<a>And now, for something completely different.</a>
<a>And now, for something completely different.</a>
<a>And now, for something completely different.</a>
And now, for something completely different.<br/>
And now, for something completely different.<br/>
And now, for something completely different.<br/>
Period.
</p>
</div>
</div>
我要截取id为list1a的DIV标签下的字符串，条件是每个<a></a>截取一次，每个<div></div>截取一次，我要存进数据库的，a标签取出来的放入一个字段，div取出来的放入一个字段，a标签所在字段和div标签所在字段是同一行的
如何取？
还有，如果div标签中有a标签怎么办，我要div里的全部内容

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

你要取的<a> </a>和<div> </div>是交替出现的吗？如果符合这个规律，那么div标签中有a标签是没有关系的MatchCollection mc = Regex.Matches(str, @"(<a[^>]*>[\s\S]*?</a>)\s*(<div[^>]*>[\s\S]*?</div>)", RegexOptions.IgnoreCase);
foreach (Match m in mc)
{
    richTextBox1.Text += m.Groups[1].Value + "\n";
    richTextBox1.Text += m.Groups[2].Value + "\n";
    richTextBox1.Text += "---------------------------\n";
}
如果id为list1a的DIV标签事先并没有取得，那么可以先获取id为list1a的DIV标签，然后再应用上面的方法取得a标签和div标签Match mStr = Regex.Match(str, @"<div[^>]*?id=""list1a""[^>]*>(((?<o>)<div[^>]*>|(?<-o>)</div>|(?:(?!</?div)[\s\S]))*)(?(o)(?!))</div>", RegexOptions.IgnoreCase);
if (mStr.Success)
{
    MatchCollection mc = Regex.Matches(mStr.Value, @"(<a[^>]*>[\s\S]*?</a>)\s*(<div[^>]*>[\s\S]*?</div>)", RegexOptions.IgnoreCase);
    foreach (Match m in mc)
    {
        richTextBox1.Text += m.Groups[1].Value + "\n";
        richTextBox1.Text += m.Groups[2].Value + "\n";
        richTextBox1.Text += "---------------------------\n";
    }
}
多谢了，实现了，但是能不能再帮我改下，我只要<a></a>里面的内容,不要把它们带上,<div></div>里的也一样
那就换一下捕获的位置就行了Match mStr = Regex.Match(str, @"<div[^>]*?id=""list1a""[^>]*>(((?<o>)<div[^>]*>|(?<-o>)</div>|(?:(?!</?div)[\s\S]))*)(?(o)(?!))</div>", RegexOptions.IgnoreCase);
if (mStr.Success)
{
    MatchCollection mc = Regex.Matches(mStr.Value, @"<a[^>]*>([\s\S]*?)</a>\s*<div[^>]*>([\s\S]*?)</div>", RegexOptions.IgnoreCase);
    foreach (Match m in mc)
    {
        richTextBox1.Text += m.Groups[1].Value + "\n";
        richTextBox1.Text += m.Groups[2].Value + "\n";
        richTextBox1.Text += "---------------------------\n";
    }
}