小弟在最近用MSXML4.0的SAX方式解析一个超过100M的XML文件,
文件不是很规范,一些节点值中使用了实体 确没有用DTD声明。
后来查明需要在XML文件中定于实体,需要在XML文件中插入
<!DOCTYPE CNAPS_DATA[ <!ENTITY nbsp " ">]>实体声明,
源文件为:<?xml version="1.0" encoding="GB2312" standalone="yes"?>
<?xml-stylesheet type="text/css" href="CNAPS_DATA.css"?>
<CNAPS_DATA>
<CNAPS_BANK_DATA>
<CNAPS_BANK>
<CNAPS_BANK_BNKCODE>&nbsp;102536200083</CNAPS_BANK_BNKCODE>
<CNAPS_BANK_STATUS>1</CNAPS_BANK_STATUS>
<CNAPS_BANK_CATEGORY>07</CNAPS_BANK_CATEGORY>
<CNAPS_BANK_CLSCODE>102</CNAPS_BANK_CLSCODE>
<CNAPS_BANK_DRECCODE>102521009993</CNAPS_BANK_DRECCODE>
<CNAPS_BANK_NODECODE></CNAPS_BANK_NODECODE>
<CNAPS_BANK_SUPRLIST></CNAPS_BANK_SUPRLIST>
<CNAPS_BANK_PBCCODE></CNAPS_BANK_PBCCODE>
<CNAPS_BANK_CITYCODE>5362</CNAPS_BANK_CITYCODE>
<CNAPS_BANK_ACCTSTATUS></CNAPS_BANK_ACCTSTATUS>
<CNAPS_BANK_ASALTDT></CNAPS_BANK_ASALTDT>
<CNAPS_BANK_ASALTTM></CNAPS_BANK_ASALTTM>
<CNAPS_BANK_LNAME>&nbsp;中国工商银行赤壁市支行营业部&nbsp;</CNAPS_BANK_LNAME>
<CNAPS_BANK_SNAME>&nbsp;赤壁市支行营业部&nbsp;</CNAPS_BANK_SNAME>
<CNAPS_BANK_ADDR>湖北省赤壁市城西路5号</CNAPS_BANK_ADDR>
<CNAPS_BANK_POSTCODE>437300</CNAPS_BANK_POSTCODE>
<CNAPS_BANK_TEL>0175-5222078</CNAPS_BANK_TEL>
<CNAPS_BANK_EMAIL></CNAPS_BANK_EMAIL>
<CNAPS_BANK_EFFDATE>20041206</CNAPS_BANK_EFFDATE>
<CNAPS_BANK_INVDATE>29991231</CNAPS_BANK_INVDATE>
<CNAPS_BANK_ALTDATE>2004-11-30 11:03:06</CNAPS_BANK_ALTDATE>
<CNAPS_BANK_ALTTYPE>2</CNAPS_BANK_ALTTYPE>
<CNAPS_BANK_ALTISSNO>20040026</CNAPS_BANK_ALTISSNO>
<CNAPS_BANK_REMARK></CNAPS_BANK_REMARK>
</CNAPS_BANK>
</CNAPS_BANK_DATA>
</CNAPS_DATA>插入DTd实体声明以后变为
<?xml version="1.0" encoding="GB2312" standalone="yes"?>
<?xml-stylesheet type="text/css" href="CNAPS_DATA.css"?>
<!DOCTYPE CNAPS_DATA[ <!ENTITY nbsp " ">]>
<CNAPS_DATA>
<CNAPS_BANK_DATA>
<CNAPS_BANK>
<CNAPS_BANK_BNKCODE>&nbsp;102536200083</CNAPS_BANK_BNKCODE>
<CNAPS_BANK_STATUS>1</CNAPS_BANK_STATUS>
<CNAPS_BANK_CATEGORY>07</CNAPS_BANK_CATEGORY>
<CNAPS_BANK_CLSCODE>102</CNAPS_BANK_CLSCODE>
<CNAPS_BANK_DRECCODE>102521009993</CNAPS_BANK_DRECCODE>
<CNAPS_BANK_NODECODE></CNAPS_BANK_NODECODE>
<CNAPS_BANK_SUPRLIST></CNAPS_BANK_SUPRLIST>
<CNAPS_BANK_PBCCODE></CNAPS_BANK_PBCCODE>
<CNAPS_BANK_CITYCODE>5362</CNAPS_BANK_CITYCODE>
<CNAPS_BANK_ACCTSTATUS></CNAPS_BANK_ACCTSTATUS>
<CNAPS_BANK_ASALTDT></CNAPS_BANK_ASALTDT>
<CNAPS_BANK_ASALTTM></CNAPS_BANK_ASALTTM>
<CNAPS_BANK_LNAME>&nbsp;中国工商银行赤壁市支行营业部&nbsp;</CNAPS_BANK_LNAME>
<CNAPS_BANK_SNAME>&nbsp;赤壁市支行营业部&nbsp;</CNAPS_BANK_SNAME>
<CNAPS_BANK_ADDR>湖北省赤壁市城西路5号</CNAPS_BANK_ADDR>
<CNAPS_BANK_POSTCODE>437300</CNAPS_BANK_POSTCODE>
<CNAPS_BANK_TEL>0175-5222078</CNAPS_BANK_TEL>
<CNAPS_BANK_EMAIL></CNAPS_BANK_EMAIL>
<CNAPS_BANK_EFFDATE>20041206</CNAPS_BANK_EFFDATE>
<CNAPS_BANK_INVDATE>29991231</CNAPS_BANK_INVDATE>
<CNAPS_BANK_ALTDATE>2004-11-30 11:03:06</CNAPS_BANK_ALTDATE>
<CNAPS_BANK_ALTTYPE>2</CNAPS_BANK_ALTTYPE>
<CNAPS_BANK_ALTISSNO>20040026</CNAPS_BANK_ALTISSNO>
<CNAPS_BANK_REMARK></CNAPS_BANK_REMARK>
</CNAPS_BANK>
</CNAPS_BANK_DATA>
</CNAPS_DATA>
由于文件较大有数百万行数据,希望网友给出一个妥善的解决方法,主要是性能,速度够快就可以。
谢谢给位了