经过再次研究,终于搞定。具体思路:先去除所有的HTML标记和JavaScript标记,然后针对得到的字符串数组进行字符串截取操作。相关代码:
function KickHtmlAndJs($document)
{
$document = trim($document);
if (strlen($document) <= 0) {
return $document;
}
$search = array ("'<script[^>]*?>.*?</script>'si", // 去掉 javascript
"'<[\/\!]*?[^<>]*?>'si", // 去掉 HTML 标记
"'([\r\n])[\s]+'", // 去掉空白字符
"'&(quot|#34);'i", // 替换 HTML 实体
"'&(amp|#38);'i",
"'&(lt|#60);'i",
"'&(gt|#62);'i",
"'&(nbsp|#160);'i"
);
$replace = array ("",
"",
"\1",
"\"",
"&",
"<",
">",
" "
);
return @preg_replace($search, $replace, $document);
} <?php
$str = replaceHtmlAndJs($rs["content"]);
$str1 = substr($str,0,200);
echo $str1;
?>没有经过特别的测试,但是通常的操作应该可以。
欢迎大家指正
function KickHtmlAndJs($document)
{
$document = trim($document);
if (strlen($document) <= 0) {
return $document;
}
$search = array ("'<script[^>]*?>.*?</script>'si", // 去掉 javascript
"'<[\/\!]*?[^<>]*?>'si", // 去掉 HTML 标记
"'([\r\n])[\s]+'", // 去掉空白字符
"'&(quot|#34);'i", // 替换 HTML 实体
"'&(amp|#38);'i",
"'&(lt|#60);'i",
"'&(gt|#62);'i",
"'&(nbsp|#160);'i"
);
$replace = array ("",
"",
"\1",
"\"",
"&",
"<",
">",
" "
);
return @preg_replace($search, $replace, $document);
} <?php
$str = replaceHtmlAndJs($rs["content"]);
$str1 = substr($str,0,200);
echo $str1;
?>没有经过特别的测试,但是通常的操作应该可以。
欢迎大家指正
解决方案 »
免费领取超大流量手机卡,每月29元包185G流量+100分钟通话, 中国电信官方发货