现有一数组:Array ( [oReturn-Path] => [Delivered-To] => [email protected] [Received] => (ncmail 17263 invoked by uid 401); 01 Feb 2012 01:39:57 -0000 [Message-ID] => <[email protected]> [Date] => Wed, 01 Feb 2012 09:39:57 +0800 [From] => [email protected] [MIME-Version] => 1.0 [Subject] => subject [To] => [email protected] [Content-Type] => multipart/mixed; boundary="------------1463804415-2084078365-1328060397=:2673" [Time] => 2012-02-01 09:39:57 [Size] => 1.12 KB [Attachs] => Array ( [attach] => Array ( [0] => Array ( [id] => 2 [name] => attach.txt [level] => 非密 ) ) ) [bodyId] => 1 ) 编译成xml: <data>
                        <^LReturn-Path>&lt;[email protected]&gt;</^LReturn-Path>
                        <Delivered-To>[email protected]</Delivered-To>
                        <Received>(ncmail  17263 invoked by uid 401); 01 Feb 2012 01:39:57 -0000</Received>
                        <Message-ID>&lt;[email protected]&gt;</Message-ID>
                        <Date>Wed, 01 Feb 2012 09:39:57 +0800</Date>
                        <From>[email protected]</From>
                        <MIME-Version>1.0</MIME-Version>
                        <Subject>subject</Subject>
                        <To>[email protected]</To>
                        <Content-Type>multipart/mixed; boundary=&quot;------------1463804415-2084078365-1328060397=:2673&quot;</Content-Type>
                        <Time>2012-02-01 09:39:57</Time>
                        <Size>1.12 KB</Size>
                        <Attachs>
                                <attach>
                                        <id>2</id>
                                        <name>attach.txt</name>
                                        <level>非密</level>
                                </attach>
                        </Attachs>
                        <bodyId>1</bodyId>
                </data>在读取xml文件并将其转换成数组时,在<^LReturn-Path>会出错,因为其中包含二进制码,我应该怎样避免它,或者可以读取它……

解决方案 »

  1.   

    我用了一个preg_match('/^ [ [:alnum:] [:space:] [:punct:] ]+ $/x',$title);但是无法完全过滤:
    XML error: Invalid character at line 66 。正则高手来帮忙啊……
      

  2.   

    并未发现 <^LReturn-Path> 中包含二进制码,况且 ^LReturn-Path 做键名也不会出错
    注意:在 php 的关联数组中,键名可以是任何你能想象到的值只不过 ^LReturn-Path 不符合基本 XML 标记命名的约定,所以你需要在解析时做特殊处理
      

  3.   

    版主终于发话了……
    请问如何将xml中的 "无效字符" 用 "空格" 替换,在讲数组编译为xml前,正则该怎么写呢?
      

  4.   

    要凑合的话,replace掉^就行了啊
    $xml=<<<XML
     <data>
    <^LReturn-Path>&lt;[email protected]&gt;</^LReturn-Path>
    <Delivered-To>[email protected]</Delivered-To>
    ........
    <bodyId>1</bodyId>
    </data>
    XML;
    $xml=&str_replace('^','',$xml);
    $obj=simplexml_load_string($xml);
    print_r($obj);
      

  5.   

    将数组中的[oReturn-Path]编译为<^LReturn-Path>,其中的 ^L 就是“无效字符”,有时会编译为
    <oReturn-Path>,其中 o 也是 无效字符,很纠结。
    如果只是将非英文、非汉字字符、非正常符号挑出来,并用“ ” 替换,正则怎么写?
    虽然^L看似可读,但在读取文件时,就是提示无效……
      

  6.   

    将xml无法识别的字符全部用空格替换,怎么做?
      

  7.   

    大家看看怎么将xml中的"Invalid character"过滤掉:<?xml version="1.0" encoding="utf-8"?>
    <mailindex>
            <datas>
                    <data>
                            <oReturn-Path>&lt;[email protected]&gt;</oReturn-Path>
                            <Delivered-To>[email protected]</Delivered-To>
                            <Received>(ncmail  17263 invoked by uid 401); 01 Feb 2012 01:39:57 -0000</Received>
                            <Message-ID>&lt;[email protected]&gt;</Message-ID>
                            <Date>Wed, 01 Feb 2012 09:39:57 +0800</Date>
                            <From>[email protected]</From>
                            <MIME-Version>1.0</MIME-Version>
                            <Subject>subject</Subject>
                            <To>[email protected]</To>
                            <Content-Type>multipart/mixed; boundary=&quot;------------1463804415-2084078365-1328060397=:2673&quot;</Content-Type>
                            <Time>2012-02-01 09:39:57</Time>
                            <Size>1.12 KB</Size>
                            <Attachs>
                                    <attach>
                                            <id>2</id>
                                            <name>attach.txt</name>
                                            <level>非密</level>
                                    </attach>
                            </Attachs>
                            <bodyId>1</bodyId>
                    </data>
                    <data>
                            <^LReturn-Path>&lt;[email protected]&gt;</^LReturn-Path>
                            <Delivered-To>[email protected]</Delivered-To>
                            <Received>by 10.180.84.98 with HTTP; Sun, 29 Jan 2012 19:05:12 -0800 (PST)</Received>
                            <DKIM-Signature>v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:sub
    ject:from:to :content-type; bh=6HSCkxH4EtEI801Yan/oYF7mNVUNULbL3owo3o9Z0YQ=; b=iD0DjhMG1dRC/iJKRr7FnrtXXieSfbBEZXt58MCmpyaYOBBvLjS4/s4DWHph9DxWBR OBD0Ge/u2za
    ab0LoB95D7kggOMXUWOcL9iG0VgOpn2qdgbryGs2+hasVaZ8iTkNgFoXb k1JqJmLp8zu0ESJ54JwpizIv9VwkBDQGkSQik=</DKIM-Signature>
                            <MIME-Version>1.0</MIME-Version>
                            <In-Reply-To>&lt;CAExYbSWP_XvcJRhj3yMSOOr8h4dLTdbk_d8W1dMeVcmYZ=PpYg@mail.gmail.com&gt;</In-Reply-To>
                            <References>&lt;CAExYbSXhNw1f1ks8LkPi01aa64iUZsMXPkiHjJ_dkFCLu-0fPg@mail.gmail.com&gt; &lt;CAExYbSWP_XvcJRhj3yMSOOr8h4dLTdbk_d8W1dMeV
    [email protected]&gt;</References>
                            <Date>Mon, 30 Jan 2012 11:05:12 +0800</Date>
                            <Message-ID>&lt;CAExYbSVcMscurCqoydUpVHwGqv012o40jFYQ=hhwScQKE1eoyQ@mail.gmail.com&gt;</Message-ID>
                            <Subject>Fwd: test gmail</Subject>
                            <From>sean yan &lt;[email protected]&gt;</From>
                            <To>[email protected]</To>
                            <Content-Type>multipart/mixed; boundary=f46d0444eab777d31604b7b61d56</Content-Type>
                            <Time>2012-01-30 11:05:12</Time>
                            <Size>120.92 KB</Size>
                            <Attachs>
                                    <attach>
                                            <id>2</id>
                                            <name>.jpg</name>
                                            <level>非密</level>
                                    </attach>
                                    <attach>
                                            <id>3</id>
                                            <name>.jpg</name>
                                            <level>非密</level>
                                    </attach>
                                    <attach>
                                            <id>4</id>
                                            <name>.jpg</name>
                                            <level>非密</level>
                                    </attach>
                            </Attachs>
                            <bodyId>1.1</bodyId>
                    </data>
           </datas>
    </mailindex>我用了//过滤无效字符
    function XmlSafeStr($s) {
    return preg_replace("/[\\x00-\\x08\\x0b-\\x0c\\x0e-\\x1f]/", '', $s);
    }还有错误啊:XML error: XML_ERR_NAME_REQUIRED at line 50 
      

  8.   

    可否给个示例代码,我在解析前后都加过替换无效字符代码,但没效果,o 和 ^L 、o 什么的都滤不出来
      

  9.   

    你的 xml 是程序生成的吗?
    如果是,请贴出 base64 后的结果比如 $xml = 生成xml的程序的返回值
    echo base64_encode($xml);
      

  10.   

    xml是用php代码生成的:function dump_xml_config_sub($arr, $indent, $in_charset, $out_charset) {
    global $listtags; $xmlconfig = ""; foreach ($arr as $ent => $val) {
    if (is_array($val)) {
    /* is it just a list of multiple values? */
    if (in_array(strtolower($ent), $listtags)) {
    foreach ($val as $cval) {
    if (is_array($cval)) {
    $xmlconfig .= str_repeat("\t", $indent);
    $xmlconfig .= "<$ent>\n";
    $xmlconfig .= dump_xml_config_sub($cval, $indent +1, $in_charset, $out_charset);
    $xmlconfig .= str_repeat("\t", $indent);
    $xmlconfig .= "</$ent>\n";
    } else {
    $xmlconfig .= str_repeat("\t", $indent);
    if ((is_bool($cval) && ($cval == true)) || ($cval === ""))
    $xmlconfig .= "<$ent/>\n";
    else
    if (!is_bool($cval))
    $xmlconfig .= "<$ent>" . htmlspecialchars($cval) . "</$ent>\n";
    // $xmlconfig .= "<$ent>" . htmlspecialchars(iconv($in_charset, $out_charset, $cval)) . "</$ent>\n";
    }
    }
    } else {
    /* it's an array */
    $xmlconfig .= str_repeat("\t", $indent);
    $xmlconfig .= "<$ent>\n";
    $xmlconfig .= dump_xml_config_sub($val, $indent +1, $in_charset, $out_charset);
    $xmlconfig .= str_repeat("\t", $indent);
    $xmlconfig .= "</$ent>\n";
    }
    } else {
    if ((is_bool($val) && ($val == true)) || ($val === "")) {
    $xmlconfig .= str_repeat("\t", $indent);
    $xmlconfig .= "<$ent/>\n";
    } else
    if (!is_bool($val)) {
    $xmlconfig .= str_repeat("\t", $indent);
    $xmlconfig .= "<$ent>" . htmlspecialchars($val) . "</$ent>\n";
    // $xmlconfig .= "<$ent>" . htmlspecialchars(iconv($in_charset, $out_charset, $val)) . "</$ent>\n";
    }
    }
    } return $xmlconfig;
    }function dump_xml_config($arr, $rootobj, $encoding) {
    // Set in/out encoding
    $in_charset = $encoding;
    $out_charset = $encoding; // Do not dump temporary encoding attribute
    unset ($arr['encoding']); $xmlconfig = "<?xml version=\"1.0\" encoding=\"{$encoding}\"?>\n";
    $xmlconfig .= "<$rootobj>\n";
    $xmlconfig .= dump_xml_config_sub($arr, 1, $in_charset, $out_charset);
    $xmlconfig .= "</$rootobj>\n"; return $xmlconfig;
    }
    function file_put_contents_safe($filename, $data, $binary = FALSE) {
    $tmpfilename = sprintf("%s.%s", $filename, getmypid());
    $mode = (TRUE === $binary) ? "wb" : "w"; if (!($fd = fopen($tmpfilename, $mode)))
    return FALSE; if (!fwrite($fd, $data)) {
    fclose($fd);
    return FALSE;
    } fclose($fd); if (!rename($tmpfilename, $filename)) {
    unlink($tmpfilename);
    return FALSE;
    } return TRUE;
    }
    //将#1的数组填进去
    $xml_info = dump_xml_config($xml_data, "mailindex", ‘utf-8');
    //将xml文本写入文件
    file_put_contents_safe($path, $xml_info);然后xml文件中就是#13的内容
    http://topic.csdn.net/u/20120228/09/cf787cd0-5936-4a15-9661-895f362b5b16.html
    用上面帖子中#2的函数解析xml,于是出现错误:
    XML error: Invalid character at line 66 。这下清楚了吧,怎么将其中的 无效字符 替换?
      

  11.   

    好吧,退而求其次:
    怎样将#1楼中的数组中,无法识别的字符用正则挑出来并用“”替掉。
    我用了下面的正则:foreach ($arrSend[1] as $title => $value) {
    preg_replace('/^ [ [:punct:] ] + $/x','',$title);
    preg_replace('/^ [ [:alnum:] [:space:] [:punct:] ]+ $/x','',$value);
    } //end for但是 XML error: XML_ERR_NAME_REQUIRED at line 43 
    也就是说$title没有过滤……
      

  12.   

    特殊字符出现在关联按键foreach ($arr as $ent => $val) {
      $ent = preg_replace('/[\x0-\x1f]/', '', $ent); //加入这个
      

  13.   


    ent-->>�� Return-Path
    没有过滤掉啊
      

  14.   

    oReturn-Path内码:
    过滤前 \x01\x6f\x03\x01\x52\x65\x74\x75\x72\x6e\x2d\x50\x61\x74\x68
    过滤后 \x6f\x52\x65\x74\x75\x72\x6e\x2d\x50\x61\x74\x68如果你还需要去掉 -
    则规则为 /[-\x0-\x1f]/如果你只想保留合法的命名
    则规则为 /\W/
      

  15.   

    我又看了一下,确实滤掉了,我没将preg_replace的返回值赋予变量,真是抱歉:……
    <turn-Path>&lt;[email protected]&gt;</turn-Path>
    ……
    <3Burn-Path>&lt;[email protected]&gt;</3Burn-Path>
    ……但我在用上面的xml函数解析的时候:
    XML error: XML_ERR_NAME_REQUIRED at line 43 
    我仔细看了下,没发现代码有错误啊,是不是xml的标签名中不能用数字开头?
      

  16.   

    是的,基本的 xml 的 标签名 的命名应符合 变量名 的命名规则
      

  17.   

    嗯,问题基本解决了,但是xml元素内容还有 无效字符 存在,但是和o还不是一个类型的,晕菜了。
    我正努力寻找中,感谢版主的帮助!