productListing">
<tr>
<td align="left" class="productListing-data"><a style="font-size:8pt" href="http://www.replicamaster.com/datejast-rolex-36mm-silver-jubilee-bracelet-swiss-watch-p-4369.html?osCsid=2e69ba852cd807339d1117e15739f428"><img src="images/watches-25082011-098-rm-.jpg" border="0" alt="DateJast Rolex 36mm Silver Jubilee Bracelet Swiss Watch" title=" DateJast Rolex 36mm Silver Jubilee Bracelet Swiss Watch " width="180" height="223.0985915493"></a><br><a style="" href="http://www.replicamaster.com/datejast-rolex-36mm-silver-jubilee-bracelet-swiss-watch-p-4369.html?osCsid=2e69ba852cd807339d1117e15739f428"><p class="prod_name_listing">DateJast Rolex 36mm Silver Jubilee Bracelet Swiss Watch</p></a><br>这只是代码中的一部分 但是其他的格式都是一样的 从这样的代码中获取
XX.html
比如http://www.replicamaster.com/datejast-rolex-36mm-silver-jubilee-bracelet-swiss-watch-p-4369.html(?osCsid=2e69ba852cd807339d1117e15739f428)括号里面的可要可不要大家可以看到这段代码有两个相同的这样的链接 只需要取一个就行了 拜求大神帮忙
<tr>
<td align="left" class="productListing-data"><a style="font-size:8pt" href="http://www.replicamaster.com/datejast-rolex-36mm-silver-jubilee-bracelet-swiss-watch-p-4369.html?osCsid=2e69ba852cd807339d1117e15739f428"><img src="images/watches-25082011-098-rm-.jpg" border="0" alt="DateJast Rolex 36mm Silver Jubilee Bracelet Swiss Watch" title=" DateJast Rolex 36mm Silver Jubilee Bracelet Swiss Watch " width="180" height="223.0985915493"></a><br><a style="" href="http://www.replicamaster.com/datejast-rolex-36mm-silver-jubilee-bracelet-swiss-watch-p-4369.html?osCsid=2e69ba852cd807339d1117e15739f428"><p class="prod_name_listing">DateJast Rolex 36mm Silver Jubilee Bracelet Swiss Watch</p></a><br>这只是代码中的一部分 但是其他的格式都是一样的 从这样的代码中获取
XX.html
比如http://www.replicamaster.com/datejast-rolex-36mm-silver-jubilee-bracelet-swiss-watch-p-4369.html(?osCsid=2e69ba852cd807339d1117e15739f428)括号里面的可要可不要大家可以看到这段代码有两个相同的这样的链接 只需要取一个就行了 拜求大神帮忙
Regex reg = new Regex(@"(?is)(?<=<td[^>]*?class=""productListing-data""[^>]*?>\s*)<a[^>]*?href=""([^""]+)""[^>]*?>");
foreach (Match m in reg.Matches(str))
Console.WriteLine(m.Groups[1].Value);
奇怪
是不是,你这里面的那个href=""([^""]+)忘了加非贪婪了。
改成:@"(?is)(?<=<td[^>]*?class=""productListing-data""[^>]*?>\s*)<a[^>]*?href=""([^""]+?)""[^>]*?>"
试试
<table border="0" width="100%" cellspacing="0" cellpadding="2" class="productListing">
<tr>
<td align="left" class="productListing-data"><a style="font-size:8pt" href="http://www.replicamaster.com/datejast-rolex-36mm-silver-jubilee-bracelet-swiss-watch-p-4369.html"><img src="images/watches-25082011-098-rm-.jpg" border="0" alt="DateJast Rolex 36mm Silver Jubilee Bracelet Swiss Watch" title=" DateJast Rolex 36mm Silver Jubilee Bracelet Swiss Watch " width="180" height="223.0985915493"></a><br><a style="" href="http://www.replicamaster.com/datejast-rolex-36mm-silver-jubilee-bracelet-swiss-watch-p-4369.html"><p class="prod_name_listing">DateJast Rolex 36mm Silver Jubilee Bracelet Swiss Watch</p></a><br>
<span style="text-align:center; color:000000; text-decoration:none; font-size:10pt; font-weight:bold">Price: $599</span></a>
<div style="height:12px"></div>
</td>
<td align="left" class="productListing-data"><a style="font-size:8pt" href="http://www.replicamaster.com/datejast-rolex-36mm-silver-jubilee-bracelet-swiss-watch-p-4368.html"><img src="images/watches-25082011-086-rm-.jpg" border="0" alt="DateJast Rolex 36mm Silver Jubilee Bracelet Swiss Watch" title=" DateJast Rolex 36mm Silver Jubilee Bracelet Swiss Watch " width="180" height="223.0985915493"></a><br><a style="" href="http://www.replicamaster.com/datejast-rolex-36mm-silver-jubilee-bracelet-swiss-watch-p-4368.html"><p class="prod_name_listing">DateJast Rolex 36mm Silver Jubilee Bracelet Swiss Watch</p></a><br>
<span style="text-align:center; color:000000; text-decoration:none; font-size:10pt; font-weight:bold">Price: $599</span></a>
<div style="height:12px"></div>
</td> <td align="left" class="productListing-data"><a style="font-size:8pt" href="http://www.replicamaster.com/datejast-rolex-36mm-silver-jubilee-bracelet-swiss-watch-p-4367.html"><img src="images/watches-25082011-074-rm-.jpg" border="0" alt="DateJast Rolex 36mm Silver Jubilee Bracelet Swiss Watch" title=" DateJast Rolex 36mm Silver Jubilee Bracelet Swiss Watch " width="180" height="223.0985915493"></a><br><a style="" href="http://www.replicamaster.com/datejast-rolex-36mm-silver-jubilee-bracelet-swiss-watch-p-4367.html"><p class="prod_name_listing">DateJast Rolex 36mm Silver Jubilee Bracelet Swiss Watch</p></a><br>
<span style="text-align:center; color:000000; text-decoration:none; font-size:10pt; font-weight:bold">Price: $599</span></a>
<div style="height:12px"></div>
</td>
</tr>
</table>
代码差不多就是这样的
大神再看看
To pmars 大神我对正则不熟习
string tempStr = sr.ReadToEnd();
string pattern = @"(?im)(?<=(productListing-data[^>]*>))<a[^>]+href=""([^""]+)""[^>]*>";
//tempStr = Regex.Replace(tempStr,pattern,"$2");
MatchCollection mc = Regex.Matches(tempStr, pattern, RegexOptions.IgnoreCase | RegexOptions.Multiline);
foreach (Match m in mc)
{
//循环输出
string url = m.Groups[2].Value;//输出:
//string href = m.Groups["href"].Value;//链接地址
//string text = m.Groups["text"].Value.Substring(href.Length-1); ;//链接内容 }
{
Response.Write(m.Groups[1].Value + "<br/>");
}有重复,行不?
<br><Br>
<br>The pictures you see were taken by the <b>ReplicaMaster.com</b> team using the
<b>actual replica watch</b>, what you see here is exactly what we ship.从这段代码中获取
比如XX:XXX 取出 XX XXX
就像代码中的<br>This is a <font color="#000000"><b></b></font> Watch<br><font color="#000000"><b>Movement:</b></font> 7750 Valjoux Swiss Made Top Grade<br><font color="#000000"><b>Functions:........取出Movement和7750 Valjoux Swiss Made Top Grade
string tempStr = File.ReadAllText(@"C:\Documents and Settings\Administrator\桌面\Test.txt", Encoding.GetEncoding("GB2312"));
string pattern = @"(?im)(?<=(<br/?><font[^>]+><b>([^:]+):\s*</b>)</font>\s*)([^<]+)<br/?>";
//tempStr = Regex.Replace(tempStr,pattern,"$2");
MatchCollection mc = Regex.Matches(tempStr, pattern, RegexOptions.IgnoreCase | RegexOptions.Multiline);
foreach (Match m in mc)
{
//循环输出
string title = m.Groups[2].Value;//输出:Movement
string text = m.Groups[3].Value;//输出:7750 Valjoux Swiss Made Top Grade
//string href = m.Groups["href"].Value;//链接地址
//string text = m.Groups["text"].Value.Substring(href.Length-1); ;//链接内容 }