Recursive patterns Consider the problem of matching a string in parentheses, allowing for unlimited nested parentheses. Without the use of recursion, the best that can be done is to use a pattern that matches up to some fixed depth of nesting. It is not possible to handle an arbitrary nesting depth. Perl 5.6 has provided an experimental facility that allows regular expressions to recurse (among other things). The special item (?R) is provided for the specific case of recursion. This PCRE pattern solves the parentheses problem (assume the PCRE_EXTENDED option is set so that white space is ignored): \( ( (?>[^()]+) | (?R) )* \) First it matches an opening parenthesis. Then it matches any number of substrings which can either be a sequence of non-parentheses, or a recursive match of the pattern itself (i.e. a correctly parenthesized substring). Finally there is a closing parenthesis. This particular example pattern contains nested unlimited repeats, and so the use of a once-only subpattern for matching strings of non-parentheses is important when applying the pattern to strings that do not match. For example, when it is applied to (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa() it yields "no match" quickly. However, if a once-only subpattern is not used, the match runs for a very long time indeed because there are so many different ways the + and * repeats can carve up the subject, and all have to be tested before failure can be reported. The values set for any capturing subpatterns are those from the outermost level of the recursion at which the subpattern value is set. If the pattern above is matched against (ab(cd)ef) the value for the capturing parentheses is "ef", which is the last value taken on at the top level. If additional parentheses are added, giving \( ( ( (?>[^()]+) | (?R) )* ) \) then the string they capture is "ab(cd)ef", the contents of the top level parentheses. If there are more than 15 capturing parentheses in a pattern, PCRE has to obtain extra memory to store data during a recursion, which it does by using pcre_malloc, freeing it via pcre_free afterwards. If no memory can be obtained, it saves data for the first 15 capturing parentheses only, as there is no way to give an out-of-memory error from within a recursion. Since PHP 4.3.3, (?1), (?2) and so on can be used for recursive subpatterns too. It is also possible to use named subpatterns: (?P>foo). If the syntax for a recursive subpattern reference (either by number or by name) is used outside the parentheses to which it refers, it operates like a subroutine in a programming language. An earlier example pointed out that the pattern (sens|respons)e and \1ibility matches "sense and sensibility" and "response and responsibility", but not "sense and responsibility". If instead the pattern (sens|respons)e and (?1)ibility is used, it does match "sense and responsibility" as well as the other two strings. Such references must, however, follow the subpattern to which they refer.
针对这个你能否弄个说明的呢? 我觉得挺不错,再把它加入FAQ,这样也好查。
我这,还出了这个问题: Warning: Compilation failed: unrecognized character after (? at offset 37
看要求: Since PHP 4.3.3, (?1), (?2) and so on can be used for recursive subpatterns too. It is also possible to use named subpatterns: (?P>foo). php要求最低4.3.3版本。请升级你的php
非常抱歉,在描述上确实有问题: 1、我原来的是在windows XP、Apache2、PHP Version 5.1.4的环境中测试,出现的问题是: 运行上述程序,就提示: Apache HTTP Server 遇到问题需要关闭。我们对此引起的不便表示抱歉。 然后服务终止。2、后来的我就把程序放到另环境中,确实是php版本低于4.3.3,出现上述:Warning: Compilation failed: unrecognized character after (? at offset 37 的错误。
Consider the problem of matching a string in parentheses, allowing for unlimited nested parentheses. Without the use of recursion, the best that can be done is to use a pattern that matches up to some fixed depth of nesting. It is not possible to handle an arbitrary nesting depth. Perl 5.6 has provided an experimental facility that allows regular expressions to recurse (among other things). The special item (?R) is provided for the specific case of recursion. This PCRE pattern solves the parentheses problem (assume the PCRE_EXTENDED option is set so that white space is ignored): \( ( (?>[^()]+) | (?R) )* \) First it matches an opening parenthesis. Then it matches any number of substrings which can either be a sequence of non-parentheses, or a recursive match of the pattern itself (i.e. a correctly parenthesized substring). Finally there is a closing parenthesis. This particular example pattern contains nested unlimited repeats, and so the use of a once-only subpattern for matching strings of non-parentheses is important when applying the pattern to strings that do not match. For example, when it is applied to (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa() it yields "no match" quickly. However, if a once-only subpattern is not used, the match runs for a very long time indeed because there are so many different ways the + and * repeats can carve up the subject, and all have to be tested before failure can be reported. The values set for any capturing subpatterns are those from the outermost level of the recursion at which the subpattern value is set. If the pattern above is matched against (ab(cd)ef) the value for the capturing parentheses is "ef", which is the last value taken on at the top level. If additional parentheses are added, giving \( ( ( (?>[^()]+) | (?R) )* ) \) then the string they capture is "ab(cd)ef", the contents of the top level parentheses. If there are more than 15 capturing parentheses in a pattern, PCRE has to obtain extra memory to store data during a recursion, which it does by using pcre_malloc, freeing it via pcre_free afterwards. If no memory can be obtained, it saves data for the first 15 capturing parentheses only, as there is no way to give an out-of-memory error from within a recursion. Since PHP 4.3.3, (?1), (?2) and so on can be used for recursive subpatterns too. It is also possible to use named subpatterns: (?P>foo). If the syntax for a recursive subpattern reference (either by number or by name) is used outside the parentheses to which it refers, it operates like a subroutine in a programming language. An earlier example pointed out that the pattern (sens|respons)e and \1ibility matches "sense and sensibility" and "response and responsibility", but not "sense and responsibility". If instead the pattern (sens|respons)e and (?1)ibility is used, it does match "sense and responsibility" as well as the other two strings. Such references must, however, follow the subpattern to which they refer.
我觉得挺不错,再把它加入FAQ,这样也好查。
Warning: Compilation failed: unrecognized character after (? at offset 37
<?
$content = "";
$content .= "<table width=\"200\" border=\"0\">\n";
$content .= " <tr>\n";
$content .= " <td>111</td>\n";
$content .= " <td> </td>\n";
$content .= " <td> </td>\n";
$content .= " </tr>\n";
$content .= " <tr>\n";
$content .= " <td> </td>\n";
$content .= " <td> </td>\n";
$content .= " <td> </td>\n";
$content .= " </tr>\n";
$content .= " <tr>\n";
$content .= " <td> </td>\n";
$content .= " <td> </td>\n";
$content .= " <td> </td>\n";
$content .= " </tr>\n";
$content .= "</table>\n";
$content .= "<table width=\"200\" border=\"0\">\n";
$content .= " <tr>\n";
$content .= " <td>222</td>\n";
$content .= " <td> </td>\n";
$content .= " <td> </td>\n";
$content .= " </tr>\n";
$content .= " <tr>\n";
$content .= " <td> </td>\n";
$content .= " <td> </td>\n";
$content .= " <td> </td>\n";
$content .= " </tr>\n";
$content .= " <tr>\n";
$content .= " <td> </td>\n";
$content .= " <td> </td>\n";
$content .= " <td> </td>\n";
$content .= " </tr>\n";
$content .= "</table>\n";
$content .= "<table width=\"200\" border=\"0\">\n";
$content .= " <tr>\n";
$content .= " <td>333</td>\n";
$content .= " <td> </td>\n";
$content .= " <td> </td>\n";
$content .= " </tr>\n";
$content .= " <tr>\n";
$content .= " <td> </td>\n";
$content .= " <td> </td>\n";
$content .= " <td> </td>\n";
$content .= " </tr>\n";
$content .= " <tr>\n";
$content .= " <td> </td>\n";
$content .= " <td> </td>\n";
$content .= " <td> </td>\n";
$content .= " </tr>\n";
$content .= "</table>";// print $content;
preg_match_all("#(?=(<table(?:(?:[^<]|<(?!table))*?|(?1))*</table>))#is",$content,$ar,PREG_OFFSET_CAPTURE);
print_r($ar);
?>
Since PHP 4.3.3, (?1), (?2) and so on can be used for recursive subpatterns too. It is also possible to use named subpatterns: (?P>foo). php要求最低4.3.3版本。请升级你的php
不过,我发现好象不完全是表达式的问题。
把$content换成空就好的。
Array
(
[0] => Array
(
[0] => Array
(
[0] =>
[1] => 0
) [1] => Array
(
[0] =>
[1] => 262
) [2] => Array
(
[0] =>
[1] => 524
) ) [1] => Array
(
[0] => Array
(
[0] => <table width="200" border="0">
<tr>
<td>111</td>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td> </td>
</tr>
</table>
[1] => 0
) [1] => Array
(
[0] => <table width="200" border="0">
<tr>
<td>222</td>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td> </td>
</tr>
</table>
[1] => 262
) [2] => Array
(
[0] => <table width="200" border="0">
<tr>
<td>333</td>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td> </td>
</tr>
</table>
[1] => 524
) ))
1、我原来的是在windows XP、Apache2、PHP Version 5.1.4的环境中测试,出现的问题是:
运行上述程序,就提示:
Apache HTTP Server 遇到问题需要关闭。我们对此引起的不便表示抱歉。
然后服务终止。2、后来的我就把程序放到另环境中,确实是php版本低于4.3.3,出现上述:Warning: Compilation failed: unrecognized character after (? at offset 37
的错误。
$content1不行,$content2是正常的。$content2把$content1里的空格去掉了,$content1的代码是从Dreamweaver里Copy过来的,难道是对什么字符有特殊问题吗?$content1 = "<table width=\"200\" border=\"0\"> <tr> <td>333</td> <td> </td> <td> </td> </tr> <tr> <td> </td> <td> </td> <td> </td> </tr> <tr> <td> </td> <td> </td> <td> </td> </tr></table>";$content2 = "<table width=\"200\" border=\"0\"><tr><td>333</td><td> </td><td> </td></tr><tr><td> </td><td> </td><td> </td></tr><tr><td> </td><td> </td><td> </td></tr></table>";
Apache/2.2.2 (Win32) PHP/5.1.1 mod_ssl/2.2.0 OpenSSL/0.9.8a
我描述的是一个运行的现象和结果,至于具体的原因不明。
我一直用这个环境运行,没有发现其它的问题。
出错的具体的现象,一旦运行,弹出错误窗口提示:
Apache HTTP Server 遇到问题需要关闭。我们对此引起的不便表示抱歉。
然后服务终止,需要在服务里重启服务才可以继续用。进一步测试,发现更奇怪了:
在"<table width=\"200\" border=\"0\">$1<tr>$2<td>
中
$1和$2处如果都有两个以上的空格的时候就会出错。$content2 = "<table width=\"200\" border=\"0\"><tr><td>333</td><td> </td><td> </td></tr><tr><td> </td><td> </td><td> </td></tr><tr><td> </td><td> </td><td> </td></tr></table>";
(
[0] => Array
(
[0] =>
) [1] => Array
(
[0] => <table width="200" border="0"><tr><td>333</td><td> </td><td> </td></tr><tr><td> </td><td> </td><td> </td></tr><tr><td> </td><td> </td><td> </td></tr></table>
))