<div class="bizDetail vcard">
<h1 class="bizName fn">Samaritano, John CPA PC</h1>
<div class="bizAddr">
<address class="adr">
<span class="street-address">555 Broadhollow Rd. (Rt. 110),Suite 230</span><span class="locality">Melville</span><span class="commaSep">,</span><span class="region">NY</span><span class="postal-code">11747-5078</span>
</address>
</div>
<div>
</div>
<div class="tel">
<span class="type">Local:</span>
<span class="value">(631) 249-5979</span>
</div>
<div class="tel">
<span class="type">Fax:</span>
<span class="value">(631) 249-7490</span>
</div>
<p class="call"><a href="http://www.yellowbook.com/clicktocall/call.aspx?listingId=1836670044&addressId=1&phoneType=Local" onclick="return GB_c2c('', this.href);" title="Click to call: Local Phone Number">Have this business call me</a></p>
<div class="bizWeb url"><a href="http://www.nycpas.net" target="_blank" onclick="OmLeadClick('profile: website link', true, '2558');PVifyExternalLink('img_external_1836670044_','/externaltracking?listingid=1836670044&listingtype=paidlisting_gold&url=http%253a%252f%252fwww.nycpas.net');" title="Go to website: www.nycpas.net">www.nycpas.net</a></div><div class="bizWeb email"><a href="mailto:[email protected]?subject=Link from yellowbook.com" onclick="OmLeadClick('profile: email link', false, '3442');" title="Email">[email protected]</a></div>
</div>
</div> <div class="threeColumn">
我想从上面的字符串得到的信息如下:
(1) Samaritano, John CPA PC
(2) 555 Broadhollow Rd. (Rt. 110),Suite 230
(3) Melville,NY11747-5078
(4) (631) 249-5979
(5) www.nycpas.net
(6) [email protected]想通过一个正则表达式来实现,请指教,谢谢
解决方案 »
- win版 fbspread MultiColumnComboBoxCellType 设置宽度
- HttpWebResponse.Close()导致httplook无响应?什么原因?
- Winfrom Gridview绑定 Ilist<对像>数据源
- 可访问性不一致??!!
- DataList的功能如何在windows程序中实现
- 急啊!!我做了个短信服务程序,将窗口最小化到任务栏后怎么就不接收短信了?
- 面向对象概念问题--如何判断upCast后再downCast的对象的类型??
- 困扰了我多年,也是困扰了很多人多年的一个问题
- 新手求救,如何对serialport设置
- .net里有这样的richtextBox类吗?
- 急!怎么老是出现"'dbo'附近有语法错误"?
- datagrid如何取其中一个单元格宽度高度后并把数据赋值进去
被不同标签隔开的好像一次性没办法取到一起去,静待过客
class="region">NY</span><span class="postal-code">11747-5078</span>这种不连续的字符串好像没办法连续取到,就算取到组里也不是放一起的,不知道过客有没有办法
我怎么测试时能不过呢,我的C#代码MessageBox.Show(Regex.Match(Ls_Temp, @"(bizName fn|street-address|locality|class=""value""|title=""Email""|bizWeb url""> <a href="")["">]*(? <value>([^ <""]+(\w* </span>[\s\S]*? </address>)*))").ToString());
MatchCollection mc = r.Matches(html);
System.Collections.IEnumerator numerator = mc.GetEnumerator();
string valus = string.Empty;
while (numerator.MoveNext())
{
Match m = (Match)numerator.Current;
if (m == numerator || m.Groups["value"] == null) continue;
//valus += Regex.Replace(m.Groups["value"].Value, "<[^>]*>", "").Replace("\r\n","")+"<br/>";
valus += Regex.Replace(m.Groups["value"].Value, "<[^>]*>", "").Replace("\r\n", "") + "\r\n";
}
//Response.Write(valus);
MessageBox.Show(valus);第一次回答问题,呵呵。
话说回来,其实这种需求,也没有太好的实现方式,写多个正则取多次的方式,灵活性会好些,效率上会差些,总体上来讲,也差不多的Regex reg = new Regex(@"(?is)<h1[^>]*>(?<h1>(?:(?!</?h1\b).)*)</h1>(?:(?!<span\b).)*<span class=""street-address"">(?<street>(?:(?!</span>).)*)</span>\s*<span[^>]*>(?<locality>(?:(?!</address>).)*)</address>(?:(?!Local:).)*Local:</span>\s*<span class=""value"">(?<local>(?:(?!</span>).)*)</span>[\s\S]*?<div class=""bizWeb url"">\s*<a[^>]*>(?<link>(?:(?!</a>).)*)</a>\s*</div>\s*<div\s*class=""bizWeb email"">\s*<a[^>]*>(?<email>(?:(?!</a>).)*)</a>");
Regex regTag = new Regex(@"<[^>]*>");
Match m = reg.Match(yourStr);
if (m.Success)
{
richTextBox2.Text += m.Groups["h1"].Value.Trim() + "\n";
richTextBox2.Text += m.Groups["street"].Value.Trim() + "\n";
richTextBox2.Text += regTag.Replace(m.Groups["locality"].Value, "").Trim() + "\n";
richTextBox2.Text += m.Groups["local"].Value.Trim() + "\n";
richTextBox2.Text += m.Groups["link"].Value.Trim() + "\n";
richTextBox2.Text += m.Groups["email"].Value.Trim() + "\n";
}
Samaritano, John CPA PC
555 Broadhollow Rd. (Rt. 110),Suite 230
Melville,NY11747-5078
(631) 249-5979
www.nycpas.net
[email protected]完全正确呀
谢谢过客了
其实就是多个正则的拼接
(?is) 忽略大小写和单行模式
<h1[^>]*>(?<h1>(?:(?!</?h1\b).)*)</h1> 取h1标签中的内容,其实(?<h1>(?:(?!</?h1\b).)*)用非贪婪模式,写起来简单些,效率也基本上不会有多大差别,<h1[^>]*>(?<h1>.*?)</h1>,主要看源字符串的形式了
(?:(?!<span\b).)* 因为接下来直到<span才是需要关注的内容,所以用这个来匹配不是<span的内容接下来的就是上面这种方式的重复了,分别取关注的标签的同容,其余的忽略掉而已
string ss = System.Text.RegularExpressions.Regex.Replace(str,@"(</?[^>]*>)|(\r\n)", "");
过客,网址和email有时有,有时两者都没有,有时两者只有其一,正则表达式需要怎么样修改下
try...
Regex reg = new Regex(@"(?is)<h1[^>]*>(?<h1>(?:(?!</?h1\b).)*)</h1>(?:(?!<span\b).)*<span\s+class=""street-address"">(?<street>(?:(?!</span>).)*)</span>\s*<span[^>]*>(?<locality>(?:(?!</address>).)*)</address>(?:(?!Local:).)*Local:</span>\s*<span\s+class=""value"">(?<local>(?:(?!</span>).)*)</span>(?:(?!<div\s+class=""bizWeb\s+(?:url|email)"">).)*(<div\s+class=""bizWeb\s+url"">\s*<a[^>]*>(?<link>(?:(?!</a>).)*)</a>\s*</div>)?(\s*<div\s+class=""bizWeb\s+email"">\s*<a[^>]*>(?<email>(?:(?!</a>).)*)</a>)?");
Regex regTag = new Regex(@"<[^>]*>");
Match m = reg.Match(yourStr);
if (m.Success)
{
richTextBox2.Text += m.Groups["h1"].Value.Trim() + "\n";
richTextBox2.Text += m.Groups["street"].Value.Trim() + "\n";
richTextBox2.Text += regTag.Replace(m.Groups["locality"].Value, "").Trim() + "\n";
richTextBox2.Text += m.Groups["local"].Value.Trim() + "\n";
richTextBox2.Text += m.Groups["link"].Value.Trim() + "\n";
richTextBox2.Text += m.Groups["email"].Value.Trim() + "\n";
}