<span class="m undline"><a onmousedown=$Any()href="http://detail.china.com/buyer/offerdetail/$Get().html" onclick=$Any() target="_blank" class="l">$Get()</a>$Any()<span class="nobr int_gray">$Get()</span>$Any()<span class="gray s">
调试欢乐多
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;namespace ConsoleCSharp
{
class Program
{
static void Main(string[] args)
{
string source = @"<span class=""m undline""><a onmousedown=$Any()href=""http://detail.china.com/buyer/offerdetail/$Get().html"" onclick=$Any() target=""_blank"" class=""l"">$Get()</a>$Any()<span class=""nobr int_gray"">$Get()</span>$Any()<span class=""gray s"">";
Regex reg = new Regex(@"\$(?<name>(?:(?!\(\)).)*)\(\)");
MatchCollection mc = reg.Matches(source);
foreach (Match m in mc)
Console.WriteLine(m.Groups["name"].Value);
}
}
}
最上面的$Any()和$Get()分别是我自己定义的两个函数,$Any()是用来匹配任意字符,$Get()用来取值。
对于上面的网页代码,我把这两个函数结合起来写规则进行内容提取。
可以对比着看一下,比如把"aliclick"用$Any()替换掉,用$Get()提取625249548。
但现在问题是我如何从最上面的规则中把标识字符串提取出来,
要求:
1 获得第一个$Any()前面的字符串<span class="m undline"><a onmousedown=
2 获得第一个$Any()和第一个$Get()之间的字符串href="http://detail.china.com/buyer/offerdetail/
C# code
我的问题还没说完,上面只是一个规则!
----------------------
样本:<span class="m undline"><a onmousedown="aliclick" href="http://detail.china.com/buyer/offerdetail/625249548.html" onclick="postKeywords " target="_blank" class="l">火龙果</a></span> <span class="nobr int_gray">03/17</span><br /> <span class="gray s"><span class="m undline"><a onmousedown="aliclick" href="http://detail.china.com/buyer/offerdetail/625249548.html" onclick="postKeywords " target="_blank" class="l">火龙果</a></span> <span class="nobr int_gray">03/17</span><br /> <span class="gray s">
------------------
结果:<span class="m undline"><a onmousedown="aliclick" href="http://detail.china.com/buyer/offerdetail/625249548.html" onclick="postKeywords " target="_blank" class="l">火龙果</a></span> <span class="nobr int_gray">03/17</span><br /> <span class="gray s"> is match
Group[0]=<span class="m undline"><a onmousedown="aliclick" href="http://detail.china.com/buyer/offerdetail/625249548.html" onclick="postKeywords " target="_blank" class="l">火龙果</a></span> <span class="nobr int_gray">03/17</span><br /> <span class="gray s">
Group[1]=625249548
Group[2]=03/17
<span class="m undline"><a onmousedown="aliclick" href="http://detail.china.com/buyer/offerdetail/625249548.html" onclick="postKeywords " target="_blank" class="l">火龙果</a></span> <span class="nobr int_gray">03/17</span><br /> <span class="gray s"> is match
Group[0]=<span class="m undline"><a onmousedown="aliclick" href="http://detail.china.com/buyer/offerdetail/625249548.html" onclick="postKeywords " target="_blank" class="l">火龙果</a></span> <span class="nobr int_gray">03/17</span><br /> <span class="gray s">
Group[1]=625249548
Group[2]=03/17