页面内容:
<body>
<font color="#008000">thu63.com/view/1047.htm 2011-8-25</font>
<span class="g">www.seo.com/bbs/ 2011-11-25 </span>
<span class="g"><b>seo</b>.chinaz.com/ 2011-12-7 </span>
<span class="g">www.seowhy.com/bbs/ 2011-11-25 </span>
<span class="g">www.seozac.com/ 2011-12-5 </span>
<span class="g">baiduseoguide.com/ 2011-12-11 </span>
<font color=#008000>jualey.com/seo 2011-12-16 </font>
<span class="g">www.<b>seo</b>bbs.net/ 2011-12-15 </span>
</body>我用的正则表达式是: Console.WriteLine("输入一个网址:");
string myUrl = Console.ReadLine();
Console.WriteLine("正在提取超链接,请稍侯...");
string strRegex = "(?<=<span class=\"g\">).*?(?=/)"; //我使用的正则表达式.
MatchCollection mc = Regex.Matches(strCode, strRegex);
foreach (Match m in mc)
{
sw.Write("{0}\r\n", m.Value);
}失败的结果是:
www.seo.com
<b>seo<
www.seowhy.com
www.seozac.com
baiduseoguide.com
www.<b>seo<
www.dunsh.org
www.<b>seo<而我想得到的结果是这样:
www.seo.com
seo.chinaz.com
www.seowhy.com
www.seozac.com
baiduseoguide.com
..
..怎么解写这句正则表达式 string strRegex = "(?<=<span class=\"g\">).*?(?=/)"; 让它过滤掉里面的<b>和</b>
<body>
<font color="#008000">thu63.com/view/1047.htm 2011-8-25</font>
<span class="g">www.seo.com/bbs/ 2011-11-25 </span>
<span class="g"><b>seo</b>.chinaz.com/ 2011-12-7 </span>
<span class="g">www.seowhy.com/bbs/ 2011-11-25 </span>
<span class="g">www.seozac.com/ 2011-12-5 </span>
<span class="g">baiduseoguide.com/ 2011-12-11 </span>
<font color=#008000>jualey.com/seo 2011-12-16 </font>
<span class="g">www.<b>seo</b>bbs.net/ 2011-12-15 </span>
</body>我用的正则表达式是: Console.WriteLine("输入一个网址:");
string myUrl = Console.ReadLine();
Console.WriteLine("正在提取超链接,请稍侯...");
string strRegex = "(?<=<span class=\"g\">).*?(?=/)"; //我使用的正则表达式.
MatchCollection mc = Regex.Matches(strCode, strRegex);
foreach (Match m in mc)
{
sw.Write("{0}\r\n", m.Value);
}失败的结果是:
www.seo.com
<b>seo<
www.seowhy.com
www.seozac.com
baiduseoguide.com
www.<b>seo<
www.dunsh.org
www.<b>seo<而我想得到的结果是这样:
www.seo.com
seo.chinaz.com
www.seowhy.com
www.seozac.com
baiduseoguide.com
..
..怎么解写这句正则表达式 string strRegex = "(?<=<span class=\"g\">).*?(?=/)"; 让它过滤掉里面的<b>和</b>
string myUrl = Console.ReadLine();
Console.WriteLine("正在提取超链接,请稍侯...");
string strRegex = "(?<=<span class=\"g\">).*?(?=(?<!<)/)"; //我使用的正则表达式.
MatchCollection mc = Regex.Matches(strCode, strRegex);
foreach (Match m in mc)
{
sw.Write("{0}\r\n", Regex.Replace(m.Value,"</?b>","");
}
用你这个运行后结果
<b>seo</b>.chinaz.com
www.seowhy.com
baiduseoguide.com
www.seozac.com
www.tui18.com
www.<b>seo</b>bbs.net
www.tuoqing.com当中的 <b>和</b> 还没有去除
尤其是这句: sw.Write("{0}\r\n", Regex.Replace(m.Value,"</?b>","");