<div class="tb-detail-hd">
<h3><a href="http://detail.tmall.com/venus/spu_detail.htm?spu_id=136191697&no_switch=1&default_item_id=13133052500" target="_blank">【五折】Jack Jones杰克琼斯连帽含羊毛双层毛衣B浅211425001104</a></h3>
<p> <span>
举报此商品(<a href="http://support.taobao.com/myservice/suit/accuse_punish.jhtml?auction_num_id=13133052500&display_type=3">举报</a>)
</span>
</p>
</div>
想提取 class为tb-detail-hd div 下面的 h3 我现在的写法:reg = @"(?is)<div class=""tb-detail-hd""><h3>(<a[^>]*>)?([^<]*)(</a>)?</h3></div>"; 提取是空的
如果 写成 reg = "<h3>(<a[^>]*>)?([^<]*)(</a>)?</h3>";
虽然可以提取但是页面有别的h3标签 那么也一并提取了 求教了
<h3>前面有空格符
<h3><a href=""http://detail.tmall.com/venus/spu_detail.htm?spu_id=136191697&no_switch=1&default_item_id=13133052500"" target=""_blank"">【五折】Jack Jones杰克琼斯连帽含羊毛双层毛衣B浅211425001104</a></h3>
<p> <span>
举报此商品(<a href=""http://support.taobao.com/myservice/suit/accuse_punish.jhtml?auction_num_id=13133052500&display_type=3"">举报</a>)
</span>
</p>
</div>";
Match match = Regex.Match(s, @"(?is)<div\s+class=""tb-detail-hd"">\s*(<h3>.+?</h3>).*?</div>");
Response.Write(Server.HtmlEncode(match.Groups[1].Value));
Regex re = new Regex("(?is)<div\\s*class=\"tb-detail-hd\">[^<]+<h3>(.*?)</h3>.*?</div>", RegexOptions.None);
LZ的那个改一下也可以Regex re = new Regex("(?is)<div\\s*class=\"tb-detail-hd\">\\s*<h3>(<a[^>]*>)?[^<]*(</a>)?</h3>.*?</div>", RegexOptions.None);
{
string result = "";
string reg = "";
switch (type)
{
case 0: return "";
case 1: reg = @"J_ImgBooth\b[^<>]*?\bsrc[\s\t\r\n]*=[\s\t\r\n]*[""']?[\s\t\r\n]*(?<imgUrl>[^\s\t\r\n""'<>]*)[^<>]*?/?[\s\t\r\n]*>"; break;
//case 2: reg = "<div class=\"tb-detail-hd\"><h3>(<a[^>]*>)?([^<]*)(</a>)?</h3></div>"; break;
case 2: reg = @"(?is)<div\s+class=""tb-detail-hd"">\s*(<h3>.+?</h3>).*?</div>"; break;
case 3: reg = "J_StrPrice[^>]*>([^<>]*)(</)"; break;
}
string regex = reg;
Regex re = new Regex(regex);
MatchCollection matches = re.Matches(content);
System.Collections.IEnumerator enu = matches.GetEnumerator();
switch (type)
{
case 0: return "";
case 1:
while (enu.MoveNext() && enu.Current != null)
{
Match match = (Match)(enu.Current);
result += match.Groups["imgUrl"];
} break;
case 2:
while (enu.MoveNext() && enu.Current != null)
{
Match match = (Match)(enu.Current);
result += match.Groups[2];
} break;
case 3:
while (enu.MoveNext() && enu.Current != null)
{
Match match = (Match)(enu.Current);
result += match.Groups[1];
} break;
}
return result;
}
还是不行 是不是这个方法的问题??? 依旧是空的
string strMatch = Regex.Match(strHtml, @"(?<=<div class=""tb-detail-hd"">\s*)<h3>(<a[^>]*>)?([^<]*)(</a>)?</h3>", RegexOptions.IgnoreCase).Value;
return strMatch;
reg = @"(?is)<div class=""tb-detail-hd""><h3>(<a[^>]*>)?(.*?)(</a>)?</h3></div>";
你的2个我都试了 第一个取的还是为"" 第二个。。取的是 "</a>" 我只想把那个div下面的字提取出来其实就是天猫的 标题我是用的这个地址测试的:http://detail.tmall.com/item.htm?id=3372931960&is_b=1&cat_id=50025829&key_words=&spm=1008.1000032.1000012.16求教了 就是不行啊
<h3><a target="_blank" href="http://detail.tmall.com/venus/spu_detail.htm?spu_id=47663902&no_switch=1&default_item_id=3372931960">包快递2012春季新款圆头鞋平跟鞋浅口单鞋女牛津鞋大码女鞋娃娃鞋</a></h3>
<p> <span>
举报此商品(<a href="http://support.taobao.com/myservice/suit/accuse_punish.jhtml?auction_num_id=3372931960&display_type=3">举报</a>)
</span>
</p>
</div>你的取的是:包快递2012春季新款圆头鞋平跟鞋浅口单鞋女牛津鞋大码女鞋娃娃鞋是吗?
刚学的
<h3><a target=""_blank"" href=""http://detail.tmall.com/venus/spu_detail.htm?spu_id=47663902&no_switch=1&default_item_id=3372931960"">包快递2012春季新款圆头鞋平跟鞋浅口单鞋女牛津鞋大码女鞋娃娃鞋</a></h3>
<p> <span>
举报此商品(<a href=""http://support.taobao.com/myservice/suit/accuse_punish.jhtml?auction_num_id=3372931960&display_type=3"">举报</a>)
</span>
</p>
</div>";
string resultStr = string.Empty;
Regex re = new Regex("(?is)<div\\s*class=\"tb-detail-hd\">[^<]+<h3><a[^>]+>(.*?)</a></h3>.*?</div>", RegexOptions.None);
MatchCollection mc = re.Matches(str);
foreach (Match ma in mc)
{
resultStr = ma.Groups[1].Value;
}
Console.WriteLine(resultStr);
Console.ReadLine();