比如下面几行网页源代码:
<meta property="fb:admins" content="550301723,1033888255,100000279817523" />
<meta property="og:type" content="article" />
<meta property="og:description" content="Defense secretary Leon Panetta says cyberattacks against critical infrastructure at home and abroad--some of which he called the worst to date--should spark urgent action against the hacker threat." />我想要提取网页中的第二个标签里的一部分内容,og:type ,
如果用正则表达式 (?<=meta property=\").+?(?=\") 的话,这三个标签的“property=”后面跟着的内容都会被匹配到。我只想要第二个的,该怎么做?望高手指教正则表达式java
<meta property="fb:admins" content="550301723,1033888255,100000279817523" />
<meta property="og:type" content="article" />
<meta property="og:description" content="Defense secretary Leon Panetta says cyberattacks against critical infrastructure at home and abroad--some of which he called the worst to date--should spark urgent action against the hacker threat." />我想要提取网页中的第二个标签里的一部分内容,og:type ,
如果用正则表达式 (?<=meta property=\").+?(?=\") 的话,这三个标签的“property=”后面跟着的内容都会被匹配到。我只想要第二个的,该怎么做?望高手指教正则表达式java
"<meta property=\"fb:admins\" " +
"content=\"550301723,1033888255,100000279817523\" />" +
"<meta property=\"og:type\" content=\"article\" />" +
"<meta property=\"og:description\"" +
" content=\"Defense secretary Leon Panetta says" +
" cyberattacks against critical infrastructure at " +
"home and abroad--some of which he called the worst" +
" to date--should spark urgent action against the hacker threat.\" />";
Pattern pattern = Pattern.compile("<meta[^/]+?/><meta property=\"([^\"]+)\"");
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
<meta[^/]+?/>//这是第一次匹配,将所有的meta中的内容匹配出来
<meta property=\"([^\"]+)\"//这是在第一次匹配基础上的第二次匹配,只匹配property=""双引号里的内容
但我不明白为什么输出group(1)的时候,会输出第二个,而其他的不会输出,能解释一下吗?谢谢了!
这个匹配的是连续两个meta标签,并把第二个meata标签的property的值用()分组了([^\"]+),所以可以用group(1)取出来