我现在手头上有形如这样的一组字符串
00-01-08 (hex) AVLAB Technology, Inc.
000108 (base 16) AVLAB Technology, Inc.
3F-1, No. 134, Sec. 3
Chung Shin Road
Hsin Tien, Taipei
TAIWAN, PROVINCE OF CHINA
我想给整理成
m_Companies.Add("00-01-08", new PhysAddressCompany("00-01-08", "000108", "AVLAB Technology, Inc.", "3F-1, No. 134, Sec. 3 Chung Shin Road Hsin Tien, Taipei TAIWAN, PROVINCE OF CHINA"));
样子,但是所在txt里还有像下面这样的字符
00-01-09 (hex) Nagano Japan Radio Co., Ltd.
000109 (base 16) Nagano Japan Radio Co., Ltd.
Nagano Japan Radio Co., Ltd. 00-01-01 (hex) PRIVATE
000101 (base 16)
这样我想就变成
m_Companies.Add("00-01-09", new PhysAddressCompany("00-01-09", "000109", "Nagano Japan Radio Co., Ltd.", ""));
m_Companies.Add("00-01-01", new PhysAddressCompany("00-01-01", "000101", "PRIVATE", ""));
也会有那种只能在 utf-8里识别的出的字符像下面这样 00-01-2A (hex) Telematica Sistems Inteligente
00012A (base 16) Telematica Sistems Inteligente
Rua Miguel Casagrande, 200
S?o Paulo
BRAZIL
规律是一组的上下都有空行,能不能用正则表达式识别出或者有其他什么更简单的,更不容易错的方法,但是第一第二行的括号部分想不出怎么去掉。因为有1万5千组这样的文本,所以不知道有什么速度快效率高的方法正则表达式
00-01-08 (hex) AVLAB Technology, Inc.
000108 (base 16) AVLAB Technology, Inc.
3F-1, No. 134, Sec. 3
Chung Shin Road
Hsin Tien, Taipei
TAIWAN, PROVINCE OF CHINA
我想给整理成
m_Companies.Add("00-01-08", new PhysAddressCompany("00-01-08", "000108", "AVLAB Technology, Inc.", "3F-1, No. 134, Sec. 3 Chung Shin Road Hsin Tien, Taipei TAIWAN, PROVINCE OF CHINA"));
样子,但是所在txt里还有像下面这样的字符
00-01-09 (hex) Nagano Japan Radio Co., Ltd.
000109 (base 16) Nagano Japan Radio Co., Ltd.
Nagano Japan Radio Co., Ltd. 00-01-01 (hex) PRIVATE
000101 (base 16)
这样我想就变成
m_Companies.Add("00-01-09", new PhysAddressCompany("00-01-09", "000109", "Nagano Japan Radio Co., Ltd.", ""));
m_Companies.Add("00-01-01", new PhysAddressCompany("00-01-01", "000101", "PRIVATE", ""));
也会有那种只能在 utf-8里识别的出的字符像下面这样 00-01-2A (hex) Telematica Sistems Inteligente
00012A (base 16) Telematica Sistems Inteligente
Rua Miguel Casagrande, 200
S?o Paulo
BRAZIL
规律是一组的上下都有空行,能不能用正则表达式识别出或者有其他什么更简单的,更不容易错的方法,但是第一第二行的括号部分想不出怎么去掉。因为有1万5千组这样的文本,所以不知道有什么速度快效率高的方法正则表达式
解决方案 »
- 如何给winForm中的treeview加上背景图片
- 在电脑中给定一个文件夹路径,要求在网页上显示文件下所有图片及其名称,要用什么控件?
- winform编程问题,大家看看以下代码,为什么窗口会关闭~~~~
- c# 其他窗体如何 调用mdi主窗体的timer.start。
- 如何通过查询在数据绑定时去除重复项?
- 求助!! 如何实现颜色渐变图形
- 关于C#中的委托的问题,高手进,低手勿扰 ------------------ 100分求解
- \ 转义
- 哪位有C#开发的COM实例
- 怎么更改文件下载时弹出窗口中的文件名
- C# access数据库显示到listview,选中一项显示到label
- SQL转Oracle问题 谢谢了!
000108 (base 16) AVLAB Technology, Inc.
3F-1, No. 134, Sec. 3
Chung Shin Road
Hsin Tien, Taipei
TAIWAN, PROVINCE OF CHINA00-01-09 (hex) Nagano Japan Radio Co., Ltd.
000109 (base 16) Nagano Japan Radio Co., Ltd.
Nagano Japan Radio Co., Ltd. 00-01-01 (hex) PRIVATE
000101 (base 16)
";
string pattern = @"(?i)(\d+-\d+-\d+)\s*?\([^()]*?\)\s*?([^\n]*?)\s*?(\d+)\s*?\([^()]*?\)\s*?[^\n]*?\s*?\n\s*?(\s*?(?<v>[^\n]*?)\s*?\n?\s*?)*?(?=\d+-|$)"; var result = Regex.Matches(input, pattern).OfType<Match>().Select(a => new
{
v1=a.Groups[1].Value,
v2=a.Groups[2].Value,
v3=a.Groups[3].Value,
v4 = string.Join(" ", a.Groups["v"].Captures.OfType<Capture>().Select(b => b.Value)) });
/*
+ [0] { v1 = "00-01-08", v2 = " AVLAB Technology, Inc.", v3 = "000108", v4 = " 3 F - 1 , N o . 1 3 4 , S e c . 3 C h u n g S h i n R o a d H s i n T i e n , T a i p e i T A I W A N , P R O V I N C E O F C H I N A " } <Anonymous Type>
+ [1] { v1 = "00-01-09", v2 = " Nagano Japan Radio Co., Ltd.", v3 = "000109", v4 = " N a g a n o J a p a n R a d i o C o . , L t d . " } <Anonymous Type>
+ [2] { v1 = "00-01-01", v2 = " PRIVATE", v3 = "000101", v4 = "" } <Anonymous Type> */
string input= File.ReadAllText(@"C:\Users\myx\Desktop\Test.txt", Encoding.GetEncoding("GB2312"));//读取txt
http://standards.ieee.org/develop/regauth/oui/oui.txt
string Url = "http://standards.ieee.org/develop/regauth/oui/oui.txt";
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(Url);
using (HttpWebResponse res = (HttpWebResponse)req.GetResponse())
{
using (StreamReader sr = new StreamReader(res.GetResponseStream(), Encoding.GetEncoding("utf-8")))
{
string txt = sr.ReadToEnd();
string pattern = @"(?i)([\da-z]+-[\da-z]+-[\da-z]+)\s*?\([^()]*?\)\s*?([^\n]*?)\s*?([\da-z]+)\s*?\([^()]*?\)\s*?[^\n]*?\s*?\n\s*?(\s*?(?<v>[^\n]*?)\s*?\n?\s*?)*?(?=\n|$)";
//string pattern = @"(?i)(\d+-\d+-\d+)\s*?\([^()]*?\)\s*?([^\n]*?)\s*?(\d+)\s*?\([^()]*?\)\s*?[^\n]*?\s*?\n\s*?(\s*?(?<v>[^\n]*?)\s*?\n?\s*?)*?(?=\d+-|$)";
var result = Regex.Matches(txt, pattern).OfType<Match>().Select(a => new
{
v1 = a.Groups[1].Value,
v2 = a.Groups[2].Value,
v3 = a.Groups[3].Value,
v4 = string.Join(" ", a.Groups["v"].Captures.OfType<Capture>().Select(b => b.Value)) });
}
}