<div class="pic"><a href="http://www.mbaobao.com/pshow-10223010.html" target="_blank"><img src="http://images.mbaobao.com/fansface/10223010/s.gif" width="160" height="160" alt="[FansFace]运动休闲两用旅行包 黑色" /></a></div>
<div class="text">
<div class="title"><a href="http://www.mbaobao.com/pshow-10223010.html" target="_blank" title="[FansFace]运动休闲两用旅行包 黑色">[FansFace]运动休闲两用旅行包 黑色</a></div>
市场价:<s>398.00</s><br />
麦包价:<span class="price">159.00</span><br />
span class="buy">
<a class="hand" onclick="add_goods_to_cart(3897);" rel="nofollow"><img src="http://www.mbaobao.com/templates/default/images/btn_cart.gif" width="87" height="22" alt="加入购物车" /></a>
<a class="hand" onclick="add_goods_to_favorites(3897);" rel="nofollow"><img src="http://www.mbaobao.com/templates/default/images/btn_favor.gif" width="50" height="22" alt="收藏" /></a>
</span>
</div>从这个字符串信息中获取到如下几个信息:
http://www.mbaobao.com/pshow-10223010.html(href链接网址)
http://images.mbaobao.com/fansface/10223010/s.gif(图片地址)
398.00(市场价)
159.00(麦包价)
谢谢~~~
(图片地址) :(?<=src=")[^"]+
(?<=市场价:<s>)\d+\.?\d*
(?<=麦包价:<span class="price">)\d+\.?\d*
\"([^\"]+\.(gif|jpg|bmp|...))\"这个匹配图片,后面的省略号是你添加的其他后缀,具体你可以无限添加
string s="你的字符串信息。。";
string sPatternLink = "(? <=href=\")[^\"]+ ";
string sLink=Regex.Match(s,sPatternLink ,RegexOptions.IgnoreCase).ToString();
Regex reg = new Regex(@"(?is)<div class=""pic""><a href=""(?<url>[^""]*)""[^>]*><img src=""(?<img>[^""]*)""[^>]*>(?:(?!市场价:).)*市场价:<s>(?<p>[\d.]+)(?:(?!麦包价:).)*麦包价:<span[^>]*>(?<mp>[\d.]+)</span>");
MatchCollection mc = reg.Matches(yourStr);
foreach(Match m in mc)
{
richTextBox2.Text += m.Groups["url"].Value + "\n";
richTextBox2.Text += m.Groups["img"].Value + "\n";
richTextBox2.Text += m.Groups["p"].Value + "\n";
richTextBox2.Text += m.Groups["mp"].Value + "\n";
}
http://www.mbaobao.com/
或者其中的某个页面,http://www.mbaobao.com/c-41/
然后通过下面的代码:
string url = "http://www.mbaobao.com/c-41/";
WebClient client = new WebClient();
byte[] page = client.DownloadData(url);
string content = System.Text.Encoding.GetEncoding("gb2312").GetString(page);就是想获取content 里面的图片的URL 图片的title 图片的价格等信息
string Httpurl = regHttpUrl.matches(yourstring)[0].tostring();
string Imageurl = regHttpUrl.matches(yourstring)[1].tostring();
这个就能取到两个网址了吧,你试试
string price = re2.Matches(temp)[0].ToString();
price = re2.Replace(price, "$1");
这个是市场价,第二个卖包价也是类似的方法换成1就好了~
<div class=""text"">
<div class=""title""><a href=""http://www.mbaobao.com/pshow-10223010.html"" target=""_blank"" title=""[FansFace]运动休闲两用旅行包 黑色"">[FansFace]运动休闲两用旅行包 黑色</a></div>
市场价:<s>398.00</s><br />
麦包价:<span class=""price"">159.00</span><br />
span class=""buy"">
<a class=""hand"" onclick=""add_goods_to_cart(3897);"" rel=""nofollow""><img src=""http://www.mbaobao.com/templates/default/images/btn_cart.gif"" width=""87"" height=""22"" alt=""加入购物车"" /></a>
<a class=""hand"" onclick=""add_goods_to_favorites(3897);"" rel=""nofollow""><img src=""http://www.mbaobao.com/templates/default/images/btn_favor.gif"" width=""50"" height=""22"" alt=""收藏"" /></a>
</span>
</div>";
Regex reg = new Regex(@"(?is)<div class=""pic""><a href=""(?<url>[^""]*)""[^>]*><img src=""(?<img>[^""]*)""[^>]*>(?:(?!市场价:).)*市场价:<s>(?<p>[\d.]+)(?:(?!麦包价:).)*麦包价:<span[^>]*>(?<mp>[\d.]+)</span>");
MatchCollection mc = reg.Matches(test);
foreach (Match m in mc)
{
richTextBox2.Text += m.Groups["url"].Value + "\n";
richTextBox2.Text += m.Groups["img"].Value + "\n";
richTextBox2.Text += m.Groups["p"].Value + "\n";
richTextBox2.Text += m.Groups["mp"].Value + "\n";
}
/*-----------输出-------------
http://www.mbaobao.com/pshow-10223010.html
http://images.mbaobao.com/fansface/10223010/s.gif
398.00
159.00
*/
我也会好好拜读你的文章,应该会收获很多,呵呵~~~