现有一数组:Array ( [oReturn-Path] => [Delivered-To] => [email protected] [Received] => (ncmail 17263 invoked by uid 401); 01 Feb 2012 01:39:57 -0000 [Message-ID] => <[email protected]> [Date] => Wed, 01 Feb 2012 09:39:57 +0800 [From] => [email protected] [MIME-Version] => 1.0 [Subject] => subject [To] => [email protected] [Content-Type] => multipart/mixed; boundary="------------1463804415-2084078365-1328060397=:2673" [Time] => 2012-02-01 09:39:57 [Size] => 1.12 KB [Attachs] => Array ( [attach] => Array ( [0] => Array ( [id] => 2 [name] => attach.txt [level] => 非密 ) ) ) [bodyId] => 1 ) 编译成xml: <data>
<^LReturn-Path><[email protected]></^LReturn-Path>
<Delivered-To>[email protected]</Delivered-To>
<Received>(ncmail 17263 invoked by uid 401); 01 Feb 2012 01:39:57 -0000</Received>
<Message-ID><[email protected]></Message-ID>
<Date>Wed, 01 Feb 2012 09:39:57 +0800</Date>
<From>[email protected]</From>
<MIME-Version>1.0</MIME-Version>
<Subject>subject</Subject>
<To>[email protected]</To>
<Content-Type>multipart/mixed; boundary="------------1463804415-2084078365-1328060397=:2673"</Content-Type>
<Time>2012-02-01 09:39:57</Time>
<Size>1.12 KB</Size>
<Attachs>
<attach>
<id>2</id>
<name>attach.txt</name>
<level>非密</level>
</attach>
</Attachs>
<bodyId>1</bodyId>
</data>在读取xml文件并将其转换成数组时,在<^LReturn-Path>会出错,因为其中包含二进制码,我应该怎样避免它,或者可以读取它……
<^LReturn-Path><[email protected]></^LReturn-Path>
<Delivered-To>[email protected]</Delivered-To>
<Received>(ncmail 17263 invoked by uid 401); 01 Feb 2012 01:39:57 -0000</Received>
<Message-ID><[email protected]></Message-ID>
<Date>Wed, 01 Feb 2012 09:39:57 +0800</Date>
<From>[email protected]</From>
<MIME-Version>1.0</MIME-Version>
<Subject>subject</Subject>
<To>[email protected]</To>
<Content-Type>multipart/mixed; boundary="------------1463804415-2084078365-1328060397=:2673"</Content-Type>
<Time>2012-02-01 09:39:57</Time>
<Size>1.12 KB</Size>
<Attachs>
<attach>
<id>2</id>
<name>attach.txt</name>
<level>非密</level>
</attach>
</Attachs>
<bodyId>1</bodyId>
</data>在读取xml文件并将其转换成数组时,在<^LReturn-Path>会出错,因为其中包含二进制码,我应该怎样避免它,或者可以读取它……
XML error: Invalid character at line 66 。正则高手来帮忙啊……
注意:在 php 的关联数组中,键名可以是任何你能想象到的值只不过 ^LReturn-Path 不符合基本 XML 标记命名的约定,所以你需要在解析时做特殊处理
请问如何将xml中的 "无效字符" 用 "空格" 替换,在讲数组编译为xml前,正则该怎么写呢?
$xml=<<<XML
<data>
<^LReturn-Path><[email protected]></^LReturn-Path>
<Delivered-To>[email protected]</Delivered-To>
........
<bodyId>1</bodyId>
</data>
XML;
$xml=&str_replace('^','',$xml);
$obj=simplexml_load_string($xml);
print_r($obj);
<oReturn-Path>,其中 o 也是 无效字符,很纠结。
如果只是将非英文、非汉字字符、非正常符号挑出来,并用“ ” 替换,正则怎么写?
虽然^L看似可读,但在读取文件时,就是提示无效……
<mailindex>
<datas>
<data>
<oReturn-Path><[email protected]></oReturn-Path>
<Delivered-To>[email protected]</Delivered-To>
<Received>(ncmail 17263 invoked by uid 401); 01 Feb 2012 01:39:57 -0000</Received>
<Message-ID><[email protected]></Message-ID>
<Date>Wed, 01 Feb 2012 09:39:57 +0800</Date>
<From>[email protected]</From>
<MIME-Version>1.0</MIME-Version>
<Subject>subject</Subject>
<To>[email protected]</To>
<Content-Type>multipart/mixed; boundary="------------1463804415-2084078365-1328060397=:2673"</Content-Type>
<Time>2012-02-01 09:39:57</Time>
<Size>1.12 KB</Size>
<Attachs>
<attach>
<id>2</id>
<name>attach.txt</name>
<level>非密</level>
</attach>
</Attachs>
<bodyId>1</bodyId>
</data>
<data>
<^LReturn-Path><[email protected]></^LReturn-Path>
<Delivered-To>[email protected]</Delivered-To>
<Received>by 10.180.84.98 with HTTP; Sun, 29 Jan 2012 19:05:12 -0800 (PST)</Received>
<DKIM-Signature>v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:sub
ject:from:to :content-type; bh=6HSCkxH4EtEI801Yan/oYF7mNVUNULbL3owo3o9Z0YQ=; b=iD0DjhMG1dRC/iJKRr7FnrtXXieSfbBEZXt58MCmpyaYOBBvLjS4/s4DWHph9DxWBR OBD0Ge/u2za
ab0LoB95D7kggOMXUWOcL9iG0VgOpn2qdgbryGs2+hasVaZ8iTkNgFoXb k1JqJmLp8zu0ESJ54JwpizIv9VwkBDQGkSQik=</DKIM-Signature>
<MIME-Version>1.0</MIME-Version>
<In-Reply-To><CAExYbSWP_XvcJRhj3yMSOOr8h4dLTdbk_d8W1dMeVcmYZ=PpYg@mail.gmail.com></In-Reply-To>
<References><CAExYbSXhNw1f1ks8LkPi01aa64iUZsMXPkiHjJ_dkFCLu-0fPg@mail.gmail.com> <CAExYbSWP_XvcJRhj3yMSOOr8h4dLTdbk_d8W1dMeV
[email protected]></References>
<Date>Mon, 30 Jan 2012 11:05:12 +0800</Date>
<Message-ID><CAExYbSVcMscurCqoydUpVHwGqv012o40jFYQ=hhwScQKE1eoyQ@mail.gmail.com></Message-ID>
<Subject>Fwd: test gmail</Subject>
<From>sean yan <[email protected]></From>
<To>[email protected]</To>
<Content-Type>multipart/mixed; boundary=f46d0444eab777d31604b7b61d56</Content-Type>
<Time>2012-01-30 11:05:12</Time>
<Size>120.92 KB</Size>
<Attachs>
<attach>
<id>2</id>
<name>.jpg</name>
<level>非密</level>
</attach>
<attach>
<id>3</id>
<name>.jpg</name>
<level>非密</level>
</attach>
<attach>
<id>4</id>
<name>.jpg</name>
<level>非密</level>
</attach>
</Attachs>
<bodyId>1.1</bodyId>
</data>
</datas>
</mailindex>我用了//过滤无效字符
function XmlSafeStr($s) {
return preg_replace("/[\\x00-\\x08\\x0b-\\x0c\\x0e-\\x1f]/", '', $s);
}还有错误啊:XML error: XML_ERR_NAME_REQUIRED at line 50
如果是,请贴出 base64 后的结果比如 $xml = 生成xml的程序的返回值
echo base64_encode($xml);
global $listtags; $xmlconfig = ""; foreach ($arr as $ent => $val) {
if (is_array($val)) {
/* is it just a list of multiple values? */
if (in_array(strtolower($ent), $listtags)) {
foreach ($val as $cval) {
if (is_array($cval)) {
$xmlconfig .= str_repeat("\t", $indent);
$xmlconfig .= "<$ent>\n";
$xmlconfig .= dump_xml_config_sub($cval, $indent +1, $in_charset, $out_charset);
$xmlconfig .= str_repeat("\t", $indent);
$xmlconfig .= "</$ent>\n";
} else {
$xmlconfig .= str_repeat("\t", $indent);
if ((is_bool($cval) && ($cval == true)) || ($cval === ""))
$xmlconfig .= "<$ent/>\n";
else
if (!is_bool($cval))
$xmlconfig .= "<$ent>" . htmlspecialchars($cval) . "</$ent>\n";
// $xmlconfig .= "<$ent>" . htmlspecialchars(iconv($in_charset, $out_charset, $cval)) . "</$ent>\n";
}
}
} else {
/* it's an array */
$xmlconfig .= str_repeat("\t", $indent);
$xmlconfig .= "<$ent>\n";
$xmlconfig .= dump_xml_config_sub($val, $indent +1, $in_charset, $out_charset);
$xmlconfig .= str_repeat("\t", $indent);
$xmlconfig .= "</$ent>\n";
}
} else {
if ((is_bool($val) && ($val == true)) || ($val === "")) {
$xmlconfig .= str_repeat("\t", $indent);
$xmlconfig .= "<$ent/>\n";
} else
if (!is_bool($val)) {
$xmlconfig .= str_repeat("\t", $indent);
$xmlconfig .= "<$ent>" . htmlspecialchars($val) . "</$ent>\n";
// $xmlconfig .= "<$ent>" . htmlspecialchars(iconv($in_charset, $out_charset, $val)) . "</$ent>\n";
}
}
} return $xmlconfig;
}function dump_xml_config($arr, $rootobj, $encoding) {
// Set in/out encoding
$in_charset = $encoding;
$out_charset = $encoding; // Do not dump temporary encoding attribute
unset ($arr['encoding']); $xmlconfig = "<?xml version=\"1.0\" encoding=\"{$encoding}\"?>\n";
$xmlconfig .= "<$rootobj>\n";
$xmlconfig .= dump_xml_config_sub($arr, 1, $in_charset, $out_charset);
$xmlconfig .= "</$rootobj>\n"; return $xmlconfig;
}
function file_put_contents_safe($filename, $data, $binary = FALSE) {
$tmpfilename = sprintf("%s.%s", $filename, getmypid());
$mode = (TRUE === $binary) ? "wb" : "w"; if (!($fd = fopen($tmpfilename, $mode)))
return FALSE; if (!fwrite($fd, $data)) {
fclose($fd);
return FALSE;
} fclose($fd); if (!rename($tmpfilename, $filename)) {
unlink($tmpfilename);
return FALSE;
} return TRUE;
}
//将#1的数组填进去
$xml_info = dump_xml_config($xml_data, "mailindex", ‘utf-8');
//将xml文本写入文件
file_put_contents_safe($path, $xml_info);然后xml文件中就是#13的内容
http://topic.csdn.net/u/20120228/09/cf787cd0-5936-4a15-9661-895f362b5b16.html
用上面帖子中#2的函数解析xml,于是出现错误:
XML error: Invalid character at line 66 。这下清楚了吧,怎么将其中的 无效字符 替换?
怎样将#1楼中的数组中,无法识别的字符用正则挑出来并用“”替掉。
我用了下面的正则:foreach ($arrSend[1] as $title => $value) {
preg_replace('/^ [ [:punct:] ] + $/x','',$title);
preg_replace('/^ [ [:alnum:] [:space:] [:punct:] ]+ $/x','',$value);
} //end for但是 XML error: XML_ERR_NAME_REQUIRED at line 43
也就是说$title没有过滤……
$ent = preg_replace('/[\x0-\x1f]/', '', $ent); //加入这个
ent-->>��Return-Path
没有过滤掉啊
过滤前 \x01\x6f\x03\x01\x52\x65\x74\x75\x72\x6e\x2d\x50\x61\x74\x68
过滤后 \x6f\x52\x65\x74\x75\x72\x6e\x2d\x50\x61\x74\x68如果你还需要去掉 -
则规则为 /[-\x0-\x1f]/如果你只想保留合法的命名
则规则为 /\W/
<turn-Path><[email protected]></turn-Path>
……
<3Burn-Path><[email protected]></3Burn-Path>
……但我在用上面的xml函数解析的时候:
XML error: XML_ERR_NAME_REQUIRED at line 43
我仔细看了下,没发现代码有错误啊,是不是xml的标签名中不能用数字开头?
我正努力寻找中,感谢版主的帮助!