这个XML文件里面的格式大概是这样: <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE dblp SYSTEM "dblp.dtd"> <dblp> record 1 ... record n </dblp>很标准的,可是这个N太大了 处理起来都是溢出 求大家帮帮忙 想想办法!感谢不尽!
先用fileoutputstream打个几百条看看嘛 如果是规律的 那就按规律来读取就OK了
刚才想用dom4j里面的SAXREADER把XML里面有incollection标签全部提取生成一个新文件,结果出错了:Error on line 1 of document : The parser has encountered more than "64,000" entity expansions in this document; this is the limit imposed by the application. Nested exception: The parser has encountered more than "64,000" entity expansions in this document; this is the limit imposed by the application. at org.dom4j.io.SAXReader.read(SAXReader.java:482) at org.dom4j.io.SAXReader.read(SAXReader.java:264) at com.project.Cut.main(Cut.java:21) Nested exception: org.xml.sax.SAXParseException: The parser has encountered more than "64,000" entity expansions in this document; this is the limit imposed by the application. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:195) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:174) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:388) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:318) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1323) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1252) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(XMLDocumentFragmentScannerImpl.java:1906) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3032) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) at org.dom4j.io.SAXReader.read(SAXReader.java:465) at org.dom4j.io.SAXReader.read(SAXReader.java:264) at com.project.Cut.main(Cut.java:21) Nested exception: org.xml.sax.SAXParseException: The parser has encountered more than "64,000" entity expansions in this document; this is the limit imposed by the application. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:195) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:174) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:388) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:318) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1323) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1252) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(XMLDocumentFragmentScannerImpl.java:1906) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3032) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) at org.dom4j.io.SAXReader.read(SAXReader.java:465) at org.dom4j.io.SAXReader.read(SAXReader.java:264) at com.project.Cut.main(Cut.java:21) 是溢出了么
没处理过这么大的XML,所以只是说说个人想法,坐等这方面的大牛
结构应该不会很简单,就是数据量大而已。
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp SYSTEM "dblp.dtd">
<dblp>
record 1
...
record n
</dblp>很标准的,可是这个N太大了 处理起来都是溢出 求大家帮帮忙 想想办法!感谢不尽!
如果是规律的
那就按规律来读取就OK了
at org.dom4j.io.SAXReader.read(SAXReader.java:482)
at org.dom4j.io.SAXReader.read(SAXReader.java:264)
at com.project.Cut.main(Cut.java:21)
Nested exception:
org.xml.sax.SAXParseException: The parser has encountered more than "64,000" entity expansions in this document; this is the limit imposed by the application.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:195)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:174)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:388)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:318)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1323)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1252)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(XMLDocumentFragmentScannerImpl.java:1906)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3032)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
at org.dom4j.io.SAXReader.read(SAXReader.java:465)
at org.dom4j.io.SAXReader.read(SAXReader.java:264)
at com.project.Cut.main(Cut.java:21)
Nested exception: org.xml.sax.SAXParseException: The parser has encountered more than "64,000" entity expansions in this document; this is the limit imposed by the application.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:195)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:174)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:388)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:318)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1323)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1252)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(XMLDocumentFragmentScannerImpl.java:1906)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3032)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
at org.dom4j.io.SAXReader.read(SAXReader.java:465)
at org.dom4j.io.SAXReader.read(SAXReader.java:264)
at com.project.Cut.main(Cut.java:21)
是溢出了么
# -*- coding:UTF-8 -*-
import os,sys
size = 80*1024*1024
partnum = 0
input = open("c:\\test.zip",'rb')
text = input.read(size)
while text:
partnum = partnum + 1
fileoutput = open("c:\\ttt\\test%04d"%partnum+".zip",'wb')
fileoutput.write(text)
fileoutput.close()
text = input.read(size)
input.close()
public static void read() throws Exception {
InputStream in = new FileInputStream("c:/struts.xml");
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
String line = null; while((line = reader.readLine()) != null) {
System.err.println(line);
} in.close();
}运行代码还是dos下运行吧,要不IDE很卡的,
不过用type命令最简单了,只不过告诉你FileInputStream的用法
package com.huawei.hdm.util;import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;/**
* The StrutsXMLParser class parse struts.xml.
*/
public class StrutsXMLParser {
Map<String, Object> map = new HashMap<String, Object>(); class StrutsXMLHandler extends SAXHandler {
private String cls;
private boolean isUrl;
private List<String> fileList = new ArrayList<String>(); @Override
public void startDocument() throws SAXException {
} @Override
public void startElement(String namespaceURI, String localName,
String qName, Attributes atts) throws SAXException {
if("include".equals(qName)) {
fileList.add(atts.getValue("file"));
} if("constant".equals(qName)) {
map.put(atts.getValue("name"), atts.getValue("value"));
} if("action".equals(qName)) {
cls = atts.getValue("class");
} if("result".equals(qName) && "error".equals(atts.getValue("name"))) {
isUrl = true;
}
} @Override
public void characters(char[] ch, int start, int length) throws SAXException {
if(cls != null && isUrl) {
String url = new String(ch, start, length); if(!map.containsKey(cls)) {
cls = cls.substring(cls.lastIndexOf(".") + 1);
map.put(cls, url);
} isUrl = false;
cls = null;
}
} @Override
public void endElement(String namespaceURI, String localName, String qName)
throws SAXException {
} @SuppressWarnings("unchecked")
@Override
public void endDocument() throws SAXException {
if(fileList.size() > 0) {
map.put("fileList", fileList);
}
}
} @SuppressWarnings("unchecked")
public Map<String, Object> parser(String xml) throws Exception {
String path = xml.substring(0, xml.lastIndexOf("/") + 1);
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
XMLReader xr = sp.getXMLReader();
StrutsXMLHandler handler = new StrutsXMLHandler();
xr.setContentHandler(handler);
xr.parse(xml); List<String> fileList = (List<String>) map.get("fileList"); if(fileList != null) {
map.remove("fileList");
Iterator it = fileList.iterator(); while(it.hasNext()) {
parser(path + it.next());
}
} return map;
} public static void main(String[] args) {
try {
Map<String, Object> map =
new StrutsXMLParser().parser("struts.xml");
Iterator<String> it = map.keySet().iterator(); while(it.hasNext()) {
String key = it.next();
String val = String.valueOf(map.get(key)); System.err.println(key + " = " + val);
}
}
catch(Exception e) {
e.printStackTrace();
}
}
}
sax的
import java.io.IOException;import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;public class XmlParseTest {
public static void main(String[] args) throws SAXException, IOException, ParserConfigurationException {
FileInputStream in=new FileInputStream("d:/test.xml"); SAXParserFactory spf=SAXParserFactory.newInstance();
SAXParser parser=spf.newSAXParser();
parser.parse(in, new MyHandler());
in.close();
}
}class MyHandler extends DefaultHandler{
private int space=0;
@Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
for(int i=0;i<space;i++){
System.out.print('\t');
}
System.out.println(qName);
space ++;
}
@Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
space --;
}
}