想要获取HTML源代码META部分的编码部分 <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
……
就是要获取charset后面的utf-8、gb2312等编码部分
最好再注释一下代码,授人以渔嘛
谢谢!
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
……
就是要获取charset后面的utf-8、gb2312等编码部分
最好再注释一下代码,授人以渔嘛
谢谢!
$str = '
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
';
preg_match_all("/charset=(.*?)\"\s*\/?>/", $str, $matchs );
print_r($matchs[1]);//Array ( [0] => UTF-8 [1] => utf-8 [2] => gb2312 )
?>
在网吧,条件简陋,拿个记事本测试了下:
<script>
var s = '<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" /><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta http-equiv="Content-Type" content="text/html; charset=gb2312">';
var re = /<\w+\s*\w+[-]\w+\s*=\s*\".*?\"\s*\w+\s*=\s*[^/].*?\s*\w+\s*=\s*([^\"].*?)\"\s*\/?>/i;
var s2 = re.exec(s);
alert(s2);
</script>
在php下用preg_match_all即可全部匹配
$reg = '/.*charset=([^\"]+)\"/i';
$str = '<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />';
//$arr 存储匹配的结果
preg_match($reg,$str,$arr);
print_r($arr);
$str = '
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
';
preg_match_all('@charset=([^"]+)@i', $str, $matchs );
print_r($matchs[1]);//Array ( [0] => UTF-8 [1] => utf-8 [2] => gb2312 )
?>
$str = '
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
';
preg_match_all('@charset=([^"]+)@i', $str, $matchs );
print_r($matchs[1]);//Array ( [0] => UTF-8 [1] => utf-8 [2] => gb2312 )
?>
$s = '<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" /><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta http-equiv="Content-Type" content="text/html; charset=gb2312">';
$pattern = "/<\w+\s*\w+[-]\w+\s*=\s*\".*?\"\s*\w+\s*=\s*\"[^;].*?;\s*\w+\s*=\s*([^\"].*?)\"\s*\/?>/i";
preg_match_all($pattern, $s, $match);
print_r($match);
?>out:
Array
(
[0] => Array
(
[0] => <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
[1] => <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
[2] => <meta http-equiv="Content-Type" content="text/html; charset=gb2312">
) [1] => Array
(
[0] => UTF-8
[1] => utf-8
[2] => gb2312
))
var_dump($match[1]);
二是在HTML页面中:<meta\s+[\s\S]*?charset\W+?(?<var>[\w-]{1,})\W*?>
说明:
之所以在HTML中时要加上meta,是因为在HTML中有些script中也有charset,但script中charset不是html的charset。