其实我也刚刚才看了preg express的语法,看php4的英文手册的。很多东西也不很懂啦。to:wcy001,你想完全去除html吗? 刚刚好手册有个类似的例子,你可以参考一下: // $document should contain an HTML document. // This will remove HTML tags, javascript sections // and white space. It will also convert some // common HTML entities to their text equivalent.$search = array ("'<script[^>]*?>.*?</script>'si", // Strip out javascript "'<[\/\!]*?[^<>]*?>'si", // Strip out html tags "'([\r\n])[\s]+'", // Strip out white space "'&(quot|#34);'i", // Replace html entities "'&(amp|#38);'i", "'&(lt|#60);'i", "'&(gt|#62);'i", "'&(nbsp|#160);'i", "'&(iexcl|#161);'i", "'&(cent|#162);'i", "'&(pound|#163);'i", "'&(copy|#169);'i", "'&#(\d+);'e"); // evaluate as php$replace = array ("", "", "\\1", "\"", "&", "<", ">", " ", chr(161), chr(162), chr(163), chr(169), "chr(\\1)");$text = preg_replace ($search, $replace, $document);
"/(<.+ )align=\"??center\"??(.*>)/iU"这个有问题吗?
收藏
1:那些正规表达式我是刚刚才学的,所以出错也很正常啦。
2:这份绝对是测试版 :) 我昨天又改进了一下。我现在家里还没有装好宽带,所以现在还不能立即贴上新的代码,明天装好宽带就可以更新快点了。也希望各位能多谢提点bug出来,我好改进。
3:这份东西我做出来就是想让中国用户用代替VBB的,我也希望有机会推广一下这套代码(如果这套代码真的好用的话)。大家如果觉得需要尽管使用。最后的正式版我会学国际的标准做一份文档出来,我会加入大家的名字。关于那些正规表达式:
"/(<.+ )align=\"??center\"??(.*>)/iU",
这句就是说匹配在<>标记内,所有align=center或者align="center"的内容。
因为在最后使用了U这个修饰符,表示这个东西是在Ungreedy模式下面的。
如果表达式写成这样:
"/(<.+ )align=\"?center\"?(.*>)/iU",
^^
那么最后的一个"将会无论如何都被归在后面的(.*)这个匹配内。那么如果有:
<table align="center" width=100...>
将会被替换为:
[table 居中" 宽=100...]
明显多了一个".
在php关于preg的文档中,写到如果在最后在加一个?(就是用我那种正规表达式),那么就把那里表示为Greedy模式。
替换出来就是:
[table 居中 宽=100...]
这样才没错.
刚刚好手册有个类似的例子,你可以参考一下:
// $document should contain an HTML document.
// This will remove HTML tags, javascript sections
// and white space. It will also convert some
// common HTML entities to their text equivalent.$search = array ("'<script[^>]*?>.*?</script>'si", // Strip out javascript
"'<[\/\!]*?[^<>]*?>'si", // Strip out html tags
"'([\r\n])[\s]+'", // Strip out white space
"'&(quot|#34);'i", // Replace html entities
"'&(amp|#38);'i",
"'&(lt|#60);'i",
"'&(gt|#62);'i",
"'&(nbsp|#160);'i",
"'&(iexcl|#161);'i",
"'&(cent|#162);'i",
"'&(pound|#163);'i",
"'&(copy|#169);'i",
"'&#(\d+);'e"); // evaluate as php$replace = array ("",
"",
"\\1",
"\"",
"&",
"<",
">",
" ",
chr(161),
chr(162),
chr(163),
chr(169),
"chr(\\1)");$text = preg_replace ($search, $replace, $document);
ungreedy即非贪婪的,即最小匹配
在ungreedy模式下,要贪婪匹配就要用问号
就像js默认是贪婪匹配,要非贪婪匹配也是要问号(可以看js有关文档)? 当该字符紧跟在任何一个其他限制符 (*, +, ?, {n}, {n,}, {n,m}) 后面时,匹配模式是非贪婪的。非贪婪模式尽可能少的匹配所搜索的字符串,而默认的贪婪模式则尽可能多的匹配所搜索的字符串。例如,对于字符串 "oooo",'o+?' 将匹配单个 "o",而 'o+' 将匹配所有 'o'。
http://www.omnitech.com.cn/sys/news2/index.htm
(公司的网站还没完成,大家就别看其他地方了,羞人:P )<?
function OTC2HTM($str){ //OmniTech Chinese HTML code.
if(!$str)
return;
$pat=array(
//单参数替换
"/(\[.+ )居中|CENTER[ \]](.*\])/iU",
"/(\[.+ )居左|LEFT[ \]](.*\])/iU",
"/(\[.+ )居右|RIGHT[ \]](.*\])/iU",
"/(\[.+ )垂直居中|MIDDLE[ \]](.*\])/iU",
"/(\[.+ )垂直置顶|TOP[ \]](.*\])/iU",
"/(\[.+ )垂直置底|BOTTOM[ \]](.*\])/iU",
"/(\[.+ )基线对齐|BASELINE[ \]](.*\])/iU",
"/(\[.+ )只读|READONLY[ \]](.*\])/iU",
"/(\[.+ )停用|DISABLED[ \]](.*\])/iU",
"/(\[.+ )多行|MULTIPLE[ \]](.*\])/iU",
"/(\[.+ )不换行|NOWARP[ \]](.*\])/iU", //双参数替换
"/(\[.+ )(宽|宽度|WIDTH)=([\d\"]+)(.*\])/iU",
"/(\[.+ )(高|高度|HEIGHT)=([\d\"]+)(.*\])/iU",
"/(\[.+ )(背景色|BGCOLOR)=(#?[0-9a-f\"]+)(.*\])/iU",
"/(\[.+ )(边框色|BORDERCOLOR)=([#\w\"]+?)(.*\])/iU",
"/(\[.+ )(边框亮边色|BORDERCOLORLIGHT)=(#?[0-9a-f\"]+)(.*\])/iU",
"/(\[.+ )(边框暗边色|BORDERCOLORDARK)=(#?[0-9a-f\"]+)(.*\])/iU",
"/(\[.+ )(CELLPADDING)=([\d\"]+)(.*\])/iU",
"/(\[.+ )(CELLSPACING)=([\d\"]+)(.*\])/iU",
"/(\[.+ )(边框|BORDER)=([\d\"]+)(.*\])/iU",
"/(\[.+ )(背景|BACKGROUND)=([\w\"]+)(.*\])/iU",
"/(\[.+ )(字色|颜色|COLOR)=([\S]+)(.*\])/iU",
"/(\[.+ )(字体|FACE)=([\S]+)(.*\])/iU",
"/(\[.+ )(最大长度|MAXLENGTH)=([\d\"]+)(.*\])/iU",
"/(\[.+ )(大小|长度|SIZE)=([+\-\d\"]+)(.*\])/iU",
"/(\[.+ )(类型|TYPE)=(\S)(.*\])/iU",
"/(\[.+ )(行|行数|ROWS)=([\d\"]+)(.*\])/iU",
"/(\[.+ )(列|列数|COLS)=([\d\"]+)(.*\])/iU",
"/(\[.+ )(样式类|CLASS)=([\w\"]+)(.*\])/iU",
"/(\[.+ )(样式|STYLE)=([\w\"]+)(.*\])/iU",
"/(\[.+ )(标识|ID)=([\w\"]+)(.*\])/iU",
"/(\[.+ )(名称|NAME)=([\w\"]+)(.*\])/iU",
"/(\[.+ )(文件|源文件|SRC)=([\w\"]+)(.*\])/iU",
"/(\[.+ )(注释|ALT)=([\S]+)(.*\])/iU",
"/(\[.+ )(标题|TITLE)=([\S]+)(.*\])/iU",
"/(\[.+ )(值|VALUE)=([\S]+)(.*\])/iU",
"/(\[.+ )(地址|URL|HREF)=([\S]+)(.*\])/iU",
"/(\[.+ )(目标|目标窗口|TARGET)=([\S]+)(.*\])/iU",
"/(\[.+ )(方法|METHOD)=(POST|GET)(.*\])/iU", //标记替换
"/(\[\/?)(图片|图|IMG)([ ]??)(.*\])/iU",
"/(\[\/?)(链接|A)([ ]??)(.*\])/iU",
"/(\[\/?)(段落|P)([ ]??)(.*\])/iU",
"/(\[\/?)(分段|层|DIV)([ ]??)(.*\])/iU",
"/(\[\/?)(SPAN)([ ]??)(.*\])/iU",
"/(\[\/?)(表格行|TR)([ ]??)(.*\])/iU",
"/(\[\/?)(表格列|TD)([ ]??)(.*\])/iU",
"/(\[\/?)(表格|TABLE)([ ]??)(.*\])/iU",
"/(\[\/?)(粗体|B)([ ]??)(.*\])/iU",
"/(\[\/?)(斜体|I)([ ]??)(.*\])/iU",
"/(\[\/?)(文字|FONT)([ ]??)(.*\])/iU",
"/(\[\/?)(列表|LI)([ ]??)(.*\])/iU",
"/(\[\/?)(输入框|INPUT)([ ]??)(.*\])/iU",
"/(\[\/?)(文本框|TEXTAREA)([ ]??)(.*\])/iU",
"/(\[\/?)(表单|FORM)([ ]??)(.*\])/iU",
"/(\[\/?)(内框架|IFRAME)([ ]??)(.*\])/iU",
"/(\[\/?)(框架|FRAME)([ ]??)(.*\])/iU",
"/\[(\/?)([a-z]+.*)\]/iUs",
//特殊替换
"/\{FLASH (文件|SRC)=(.+) (宽|WIDTH)=([\d%]+) (高|HEIGHT)=([\d%]+)( wmode=(transparent))?.*\}/iU"
); $replace=array(
//单参数替换
"\\1align=\"center\"\\2",
"\\1align=\"left\"\\2",
"\\1align=\"right\"\\2",
"\\1valign=\"middle\"\\2",
"\\1valign=\"top\"\\2",
"\\1valign=\"bottom\"\\2",
"\\1valign=\"baseline\"\\2",
"\\1readonly\\2",
"\\1disabled\\2",
"\\1multiple\\2",
"\\1nowrap\\2", //双参数替换
"\\1width=\\3\\4",
"\\1height=\\3\\4",
"\\1bgcolor=\\3\\4",
"\\1bordercolor=\\3\\4",
"\\1bordercolorlight=\\3\\4",
"\\1bordercolordark=\\3\\4",
"\\1cellpadding=\\3\\4",
"\\1cellspacing=\\3\\4",
"\\1border=\\3\\4",
"\\1background=\\3\\4",
"\\1color=\\3\\4",
"\\1face=\\3\\4",
"\\1maxlength=\\3\\4",
"\\1size=\\3\\4",
"\\1type=\\3\\4",
"\\1rows=\\3\\4",
"\\1cols=\\3\\4",
"\\1class=\\3\\4",
"\\1style=\\3\\4",
"\\1id=\\3\\4",
"\\1name=\\3\\4",
"\\1src=\\3\\4",
"\\1alt=\\3\\4",
"\\1title=\\3\\4",
"\\1value=\\3\\4",
"\\1href=\\3\\4",
"\\1target=\\3\\4",
"\\1method=\\3\\4", // 标记以及括号替换
"\\1img\\3\\4",
"\\1a\\3\\4",
"\\1p\\3\\4",
"\\1div\\3\\4",
"\\1span\\3\\4",
"\\1tr\\3\\4",
"\\1td\\3\\4",
"\\1table\\3\\4",
"\\1b\\3\\4",
"\\1i\\3\\4",
"\\1font\\3\\4",
"\\1li\\3\\4",
"\\1input\\3\\4",
"\\1textarea\\3\\4",
"\\1form\\3\\4",
"\\1iframe\\3\\4",
"\\1frame\\3\\4",
"<\\1\\2>",
//特殊替换
"<object classid=\"clsid:D27CDB6E-AE6D-11cf-96B8-444553540000\" codebase=\"http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,29,0\" width=\"\\4\" height=\"\\6\"><param name=\"movie\" value=\"\\2\"><param name=\"quality\" value=\"high\"><PARAM NAME=\"wmode\" VALUE=\"\\8\"><embed src=\"\\2\" quality=\"high\" \\7 pluginspage=\"http://www.macromedia.com/go/getflashplayer\" type=\"application/x-shockwave-flash\" width=\"\\4\" height=\"\\6\"></embed></object>"
);
$str=preg_replace($pat,$replace,$str);
return finput($str,1);
}
function HTM2OTC($str){
if(!$str)
return;
$str=foutput($str);
$pat=array(
"/(<.+ )align=\"??center\"??(.*>)/iU",
"/(<.+ )align=\"??left\"??(.*>)/iU",
"/(<.+ )align=\"??right\"??(.*>)/iU",
"/(<.+ )valign=\"??middle\"??(.*>)/iU",
"/(<.+ )valign=\"??top\"??(.*>)/iU",
"/(<.+ )valign=\"??bottom\"??(.*>)/iU",
"/(<.+ )align=\"??baseline\"??(.*>)/iU",
"/(<.+ )readonly(.*>)/iU",
"/(<.+ )disabled(.*>)/iU",
"/(<.+ )MULTIPLE(.*>)/iU",
"/(<.+ )nowrap(.*>)/iU",
"/(<.+ )width=(\"??[^\W_]+\"??)(.*>)/iU",
"/(<.+ )height=(\"??[^\W_]+\"??)(.*>)/iU",
"/(<.+ )bgcolor=(\"??#?[0-9a-f]+\"??)(.*>)/iU",
"/(<.+ )bordercolor=(\"??#?[0-9a-f]+\"??)(.*>)/iU",
"/(<.+ )bordercolorlight=(\"??#?[0-9a-f]+\"??)(.*>)/iU",
"/(<.+ )bordercolordark=(\"??#?[0-9a-f]+\"??)(.*>)/iU",
"/(<.+ )cellpadding=(\"??\d+\"??)(.*>)/iU",
"/(<.+ )cellspacing=(\"??\d+\"??)(.*>)/iU",
"/(<.+ )border=(\"??\d+\"??)(.*>)/iU",
"/(<.+ )background=(\"??\w+\"??)(.*>)/iU",
"/(<.+ )color=(\"??#?[0-9a-f]+?)(.*>)/iU",
"/(<.+ )face=(\"??\S+\"??)(.*>)/iU",
"/(<.+ )size=(\"??[+\-\d\"]+\"??)(.*>)/iU",
"/(<.+ )maxlength=(\"??\d+\"??)(.*>)/iU",
"/(<.+ )type=(\"??\w+\"??)(.*>)/iU",
"/(<.+ )rows=(\"??\d+\"??)(.*>)/iU",
"/(<.+ )cols=(\"??\d+\"??)(.*>)/iU",
"/(<.+ )class=(\"??\w+\"??)(.*>)/iU",
"/(<.+ )style=(\"??\w+\"??)(.*>)/iU",
"/(<.+ )id=(\"??\w+\"??)(.*>)/iU",
"/(<.+ )name=(\"??\w+\"??)(.*>)/iU",
"/(<.+ )src=(\"??\w+\"??)(.*>)/iU",
"/(<.+ )alt=(\"??\S+\"??)(.*>)/iU",
"/(<.+ )title=(\"??\S+\"??)(.*>)/iU",
"/(<.+ )value=(\"??\S+\"??)(.*>)/iU",
"/(<.+ )href=(\"??\S+\"??)(.*>)/iU", "/<(\/?)img\b([ ]??)(.*)>/iU",
"/<(\/?)a\b([ ]??)(.*)>/iU",
"/<(\/?)p\b([ ]??)(.*)>/iU",
"/<(\/?)div\b([ ]??)(.*)>/iU",
"/<(\/?)span\b([ ]??)(.*)>/iU",
"/<(\/?)tr\b([ ]??)(.*)>/iU",
"/<(\/?)td\b([ ]??)(.*)>/iU",
"/<(\/?)table\b([ ]??)(.*)>/iU",
"/<(\/?)b\b([ ]??)(.*)>/iU",
"/<(\/?)i\b([ ]??)(.*)>/iU",
"/<(\/?)font\b([ ]??)(.*)>/iU",
"/<(\/?)li\b([ ]??)(.*)>/iU",
"/<(\/?)input\b([ ]??)(.*)>/iU",
"/<(\/?)textarea\b([ ]??)(.*)>/iU",
"/<(\/?)form\b([ ]??)(.*)>/iU",
"/<(\/?)iframe\b([ ]??)(.*)>/iU",
"/<(\/?)frame\b([ ]??)(.*)>/iU",
"/<(\/?)([a-z]+.*)>/isU"
); $replace=array(
"\\1居中\\2",
"\\1居左\\2",
"\\1居右\\2",
"\\1垂直居中\\2",
"\\1垂直置顶\\2",
"\\1垂直置底\\2",
"\\1底线对齐\\2",
"\\1只读\\2",
"\\1停用\\2",
"\\1多行\\2",
"\\1不换行\\2",
"\\1宽=\\2\\3",
"\\1高=\\2\\3",
"\\1背景色=\\2\\3",
"\\1边框色=\\2\\3",
"\\1边框亮边色=\\2\\3",
"\\1边框暗边色=\\2\\3",
"\\1CELLPADDING=\\2\\3",
"\\1CELLSPACING=\\2\\3",
"\\1边框=\\2\\3",
"\\1背景=\\2\\3",
"\\1字色=\\2\\3",
"\\1字体=\\2\\3",
"\\1大小=\\2\\3",
"\\1最大长度=\\2\\3",
"\\1类型=\\2\\3",
"\\1行数=\\2\\3",
"\\1列数=\\2\\3",
"\\1样式类=\\2\\3",
"\\1样式=\\2\\3",
"\\1标识=\\2\\3",
"\\1名称=\\2\\3",
"\\1源文件=\\2\\3",
"\\1注释=\\2\\3",
"\\1标题=\\2\\3",
"\\1值=\\2\\3",
"\\1地址=\\2\\3", "[\\1图片\\2\\3]",
"[\\1链接\\2\\3]",
"[\\1段落\\2\\3]",
"[\\1分段\\2\\3]",
"[\\1SPAN\\2\\3]",
"[\\1表格行\\2\\3]",
"[\\1表格列\\2\\3]",
"[\\1表格\\2\\3]",
"[\\1粗体\\2\\3]",
"[\\1斜体\\2\\3]",
"[\\1文字\\2\\3]",
"[\\1列表\\2\\3]",
"[\\1输入框\\2\\3]",
"[\\1文本框\\2\\3]",
"[\\1表单\\2\\3]",
"[\\1内框架\\2\\3]",
"[\\1框架\\2\\3]",
"[\\1\\2]"
);
return preg_replace($pat,$replace,$str);
}function finput($str,$withhtml=0){
if(!$withhtml){
$str=htmlspecialchars($str);
$str=str_replace(" "," ",$str);
$str=str_replace("\r\n","<br>",$str);
}else{
$str=str_replace(" "," ",$str);
$str=str_replace("\r\n","<br>",$str);
$str=preg_replace_callback("/(<)(.*)(>)/iU",repspace,$str);
$str=ereg_replace(">(<br>| )+(<[/]?[^<>]+>)",">\\2",$str);
}
return $str;
}function foutput($str,$mode=0){
$str=str_replace(" "," ",$str);
$str=str_replace("<br>","\r\n",$str);
if($mode==0){
$str=str_replace("&","&",$str);
$str=str_replace("<","<",$str);
$str=str_replace(">",">",$str);
$str=str_replace(""","\"",$str);
$str=str_replace("'","\'",$str);
}
return $str;
}function parseURL($str){
$patURL="<a [^<>]+>((http|ftp)://[^<>[:space:]]+[[:alnum:]/])</a>";
$replaceURL="\\1";
$str=eregi_replace($patURL,$replaceURL,$str);
$patURL="(http|ftp)://[^<>[:space:]]+[[:alnum:]/]";
$replaceURL="<a href=\\0 target=_blank>\\0</a>";
$str=eregi_replace($patURL,$replaceURL,$str);
return $str;
}function repspace($array){
return $array[1].preg_replace("/ /iU"," ",$array[2]).$array[3];
}
?>