没有用正则:
$s = <<<AAA
<div id="c1">标题1</div>
<div id="123">标题2</div>
<div id="c_-3">标题3</div>
<div id="总">标题4</div>[code]
AAA;
print strip_tags($s);
$s = <<<AAA
<div id="c1">标题1</div>
<div id="123">标题2</div>
<div id="c_-3">标题3</div>
<div id="总">标题4</div>[code]
AAA;
print strip_tags($s);
$str = "<div id='c1'>标题1</div>
<div id='123'>标题2</div>
<div id='c_-3'>标题3</div>
<div id='总'>标题4</div>";$p = "/<div id='(.*?)'>(.*?)<\/div>/";echo preg_match_all($p,$str,$match);
echo "<br>";print_r($match);
?>
<a href="1.html" id="title1">标题1</a>
<a href="2.html" id="title2">标题2</a>
<a href="3.html" id="title3">标题3</a>
<a href="4.html" id="title4">标题4</a>
<a href="5.html" id="title5">标题5</a>
<a href="6.html" id="title6">标题6</a>采集网站时,遇到上面的列表,我们要用正则表达式来匹配出地址和标题,但id的内容是一个"变量",不是我们采集的目标
,我们就必须用表达式来匹配它,但有些时候匹配的这个"变量"可能包含数字,字母,中文...所以就需要既不影响采集出来的数据,又要匹配各式各样的"变量"的一个表达式.
如果用了(),就影响到采集出来的数据.
如果直接获取"变量"的下一位字符(按上面的html代码,获取的是"号), 正则写成
[PHP code]/<a href=\"(.+?)\" id=\"[^\"]*\">(.+?)<\/a>/s[/code],会不会麻烦点?
<a href="1.html" id="title1">标题1</a>
<a href="2.html" id="title2">标题2</a>
<a href="3.html" id="title3">标题3</a>
<a href="4.html" id="title4">标题4</a>
<a href="5.html" id="title5">标题5</a>
<a href="6.html" id="title6">标题6</a>你想要什么输出???
$data='<a href="1.html" id="title1">标题1</a>
<a href="2.html" id="title2">标题2</a>
<a href="3.html" id="title3">标题3</a>
<a href="4.html" id="title4">标题4</a>
<a href="5.html" id="title5">标题5</a>
<a href="6.html" id="title6">标题6</a>';
preg_match_all("/<a href=\"(.+?)\" id=\"[^\"]*\">(.+?)<\/a>/is",$data,$match);
print_r($match);Array
(
[0] => Array
(
[0] => <a href="1.html" id="title1">标题1</a>
[1] => <a href="2.html" id="title2">标题2</a>
[2] => <a href="3.html" id="title3">标题3</a>
[3] => <a href="4.html" id="title4">标题4</a>
[4] => <a href="5.html" id="title5">标题5</a>
[5] => <a href="6.html" id="title6">标题6</a>
) [1] => Array
(
[0] => 1.html
[1] => 2.html
[2] => 3.html
[3] => 4.html
[4] => 5.html
[5] => 6.html
) [2] => Array
(
[0] => 标题1
[1] => 标题2
[2] => 标题3
[3] => 标题4
[4] => 标题5
[5] => 标题6
))
标题 /<a href=\"[^\"]*\" id=\"[^\"]*\">(.+?)<\/a>/is地址 /<a href=\"(.+?)\" id=\"[^\"]*\">[^<]<\/a>/is
<a href="1.html" id="title1">标题1</a>
<a href="2.html" id="title2">标题2</a>
<a href="3.html" id="title3">标题3</a>
<a href="4.html" id="title4">标题4</a>
<a href="5.html" id="title5">标题5</a>
<a href="6.html" id="title6">标题6</a>
AAA;if (preg_match_all("/<a\s+href=['\"]?(.*?)['\"]?\s+[^>]+>([^<]+)<.*/i",$s,$m)) {
print_r($m[1]);
print_r($m[2]);
}
header("Content-type:text/html;charset=utf-8");$s = '<div id="c1">标题1</div>
<div id="123">标题2</div>
<div id="c_-3">标题3</div>
<div id="总">标题4</div>';preg_match_all("/<div id=\"[a-z0-9]+\">(.*?)<\/div>/is",$s,$cc);print_r($cc);[a-z0-9]匹配数字和字母()内的为一个区块
/is不分大小写