想把前面那排URL地址,用正则表达式一次性单行匹配分组为后面的格式:
Path、File、Reco、Page、Format
注意,是一次性的单行匹配。URL Path File Reco Page Format
/
/news news
/news/ news
/news/gz news/gz
/news/gz/ news/gz
/news/gz/list.aspx news/gz list aspx
/news/gz/list.10.html news/gz list 10 html
/news/gz/list.10.1.html news/gz list 10 1 html
/news/gz/list_10.aspx news/gz list 10 aspx
/news/gz/list_10_1.html news/gz list 10 1 html
/news/gz/list-10.html news/gz list 10 html
/news/gz/list-10-1.html news/gz list 10 1 html
/list.html list html
/list.11.html list 11 html
/list.11.2.html list 11 2 html
/list_11.html list 11 html
/list_11_2.html list 11 2 html
/list-11.html list 11 html
/list-11-2.html list 11 2 html
Path、File、Reco、Page、Format
注意,是一次性的单行匹配。URL Path File Reco Page Format
/
/news news
/news/ news
/news/gz news/gz
/news/gz/ news/gz
/news/gz/list.aspx news/gz list aspx
/news/gz/list.10.html news/gz list 10 html
/news/gz/list.10.1.html news/gz list 10 1 html
/news/gz/list_10.aspx news/gz list 10 aspx
/news/gz/list_10_1.html news/gz list 10 1 html
/news/gz/list-10.html news/gz list 10 html
/news/gz/list-10-1.html news/gz list 10 1 html
/list.html list html
/list.11.html list 11 html
/list.11.2.html list 11 2 html
/list_11.html list 11 html
/list_11_2.html list 11 2 html
/list-11.html list 11 html
/list-11-2.html list 11 2 html
http://(?<url>[^/]*)(?<path>/?.*/)?(?:(?<type>index|list|view)(?:[\._-](?<data>\d+))?(?:[\._-](?<page>\d+))?\.(?<exts>html|aspx))?
foreach (string s in data)
{
Regex reg = new Regex(@"^/(?:(?<Path>(?:[^/\s.]+/)*[^/\s.]+)/?(?=/|[^/]*$))?(?:(?<=/)(?<File>[^/\s._-]+)(?:[._-](?<Reco>\d+))?(?:[._-](?<Page>\d+))?(?:\.(?<Format>[^/\s._-]+)))?$");
MatchCollection mc = reg.Matches(s);
foreach (Match m in mc)
{
richTextBox2.Text += s.PadRight(30) + m.Groups["Path"].Value.PadRight(15) + m.Groups["File"].Value.PadRight(10) + m.Groups["Reco"].Value.PadRight(10) + m.Groups["Page"].Value.PadRight(10) + m.Groups["Format"].Value + "\n";
}
}
/*-----输出------
/
/news news
/news/ news
/news/gz news/gz
/news/gz/ news/gz
/news/gz/list.aspx news/gz list aspx
/news/gz/list.10.html news/gz list 10 html
/news/gz/list.10.1.html news/gz list 10 1 html
/news/gz/list_10.aspx news/gz list 10 aspx
/news/gz/list_10_1.html news/gz list 10 1 html
/news/gz/list-10.html news/gz list 10 html
/news/gz/list-10-1.html news/gz list 10 1 html
/list.html list html
/list.11.html list 11 html
/list.11.2.html list 11 2 html
/list_11.html list 11 html
/list_11_2.html list 11 2 html
/list-11.html list 11 html
/list-11-2.html list 11 2 html
*/
如果网址后面带参数,就不能识别了,比如:
/news?charset=chs&act=submit&user=guest
/news/gz/?charset=chs&act=submit&user=guest
/news/gz/list_10.12.html?charset=chs&act=submit&user=guest
这里带上了QueryString,要能同时一次性的识别后面的参数,并且后面的参数个数可能一是个或多个,位置也不要受到限制。如:
1、charset=chs&act=submit&user=guest
2、user=guest&act=submit
3、act=submit
如匹配:/news/gz/list.html?user=guest&charset=chs的两个参数的正则是:
\?(?:&?charset=(?<charset>[^&]*)?|&?user=(?<act>[^&]*)?|&?\w+=(?<other>\w+)?)*
可以不分参数个数,不分参数的先后。