正则问题，考到我了

正则一向是我的弱势，今天有遇到一个问题：
我需要对一段HTML取需要的部分，如：
<div id="Title"><a href="/">Free Articles</a></div>
<h2>Free articles to be reprinted or published.</h2>
<div>sfdsfdsfdsfasfa</div>
<h2>afasfasfasf</h2>我想取Free articles to be reprinted or published这部分内容
不能用(?<=<h2>)(.*?)(?=<h2>)这种方式

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

(?<=<(?!h2)[^>]*>|^)(?!\s+<)[^<>]+(?=<(?!/h2)[^>]*>|$)
我想做的通用一点，能适用于任何HTML取部分内容
试试
System.Text.RegularExpressions.Match match2 = Regex.Match(原字符串, "Title.*?<h2>(?<value1>.*?)</h2>");取值
match2.Groups["value1"].Value注意原字符串中如有换行的要替换掉换行符
            string str = @"<div id=""Title""><a href=""/"">Free Articles</a></div>
<h2>Free articles to be reprinted or published.</h2>
<div>sfdsfdsfdsfasfa</div>
<h2>afasfasfasf</h2>";
            Regex reg = new Regex(@"(?<=<h2[^>]*?>)(?:(?!</?h2).)*");
            Response.Write(reg.Match(str).Value);
//Free articles to be reprinted or published.
不好意思,贴主不想用正则,那只有用例如 IndexOf 和 Substring 一起来做,似乎也不能一律通用
我就是想用正则，但是我想用通用点的正则，比如
<h2>afasfasfasf</h2>
<div id="Title"><a href="/">Free Articles</a></div>
<h2>Free articles to be reprinted or published.</h2>
<div>sfdsfdsfdsfasfa</div>如果是这样的话，我想取Free articles to be reprinted or published.其实我想要的效果有点像信息采集器，可以灵活的自由设置采集规则
那你要取的这个有虾米规则？主要在div id="title"后面？
你为何不取:
afasfasfasf
Free Articles
sfdsfdsfdsfasfa
总得有规则吧！
仅仅因为你想取的内容像个英文句子？还得是两个单词以上？
Free articles to be reprinted or published.
我看到一个采集器软件很好，好像叫火车头。它就能自定义规则，比如定义：
<div(*)>[内容]</div>
它就能取到内容，我现在这个项目就是想要这种效果。
<div(*)>[内容]</div>
没用过火车头。如果div内还内嵌有多层div，不知火车头能取到什么？
其实你要做的是生成简单的正则。
象这一类
abc*ssss{url}ddddd*bbb* 表示任意字符， {url} 就是符合普通url规则的正则，替换掉， * 再替换成任意字符。
生成任意正则很困难，但就某类模式生成还是容易的。
开源网站上的spider,和火车头基本一样，你可以下载下来看下。
<div(*)>[内容]</div>
应该算是“表示层”的东西，不是真正的正则表达式在业务层需要将其修改为真正的正则，并进行查找，如何修改完全看你想得到什么。
这个不用正则表达式我有一个采集程序例如 string lawfirmname = caiji.getbody(value, "<h1  id=\"H1jobname\">", "</h1>", false, false);"<h1  id=\"H1jobname\">", "</h1>", 原内容是："<h1  id="H1jobname">", "</h1>", 思路就是把你要采集的内容前后的内容取下来，就是采集到中间的那部分了，但是前面的内容必须是唯一才行。
这个有问题啊，如果我需要在"<h1 id=\"H1jobname\">"这里面匹配呢，如"<h1 id=\"H1jobname\"(*)>"
"<h1 id=\"H1jobname\">"这里面匹配呢，如"<h1 id=\"H1jobname\"(*)>"这个是你要采集的页面，采集应该是采集别人的网页吧，你怎么在别人的网页里加内容？？？