rt,自己写了一个java程序来转html 为text
从老师那里获得了网页的data(就是8000多个网页,分目录存储)
我的程序可以运行,在选取了分析目录和保存文件的目录后,运行,但是中间非常奇怪的有3个异常,
有3个文件无法分析,但是我调试的时候单独分析这3个文件又可以出结果,十分不解。希望各位达人抽空帮我看下,初学java真的不胜感激!
异常如下:
java.lang.ArrayIndexOutOfBoundsException: 85
at javax.swing.text.html.parser.ContentModel.first(ContentModel.java:177)
at javax.swing.text.html.parser.ContentModel.first(ContentModel.java:156)
at javax.swing.text.html.parser.ContentModelState.advance(ContentModelState.java:191)
at javax.swing.text.html.parser.ContentModelState.advance(ContentModelState.java:195)
at javax.swing.text.html.parser.TagStack.advance(TagStack.java:154)
at javax.swing.text.html.parser.Parser.legalElementContext(Parser.java:540)
at javax.swing.text.html.parser.Parser.legalTagContext(Parser.java:715)
at javax.swing.text.html.parser.Parser.parseTag(Parser.java:1930)
at javax.swing.text.html.parser.Parser.parseContent(Parser.java:2063)
at javax.swing.text.html.parser.Parser.parse(Parser.java:2230)
at javax.swing.text.html.parser.DocumentParser.parse(DocumentParser.java:122)
at javax.swing.text.html.parser.ParserDelegator.parse(ParserDelegator.java:90)
at html2test.Html2Text.parse(Html2Text.java:33)
at html2test.Html2Text.getfile(Html2Text.java:57)
at html2test.Html2Text.getfile(Html2Text.java:77)
at html2test.Html2Text.getfile(Html2Text.java:77)
at html2test.Html2Text.access$000(Html2Text.java:22)
at html2test.Html2Text$FileChooserDemo.actionPerformed(Html2Text.java:172)
at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2012)
at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2335)
at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:404)
at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:253)
at java.awt.Component.processMouseEvent(Component.java:6099)
at javax.swing.JComponent.processMouseEvent(JComponent.java:3287)
at java.awt.Component.processEvent(Component.java:5864)
at java.awt.Container.processEvent(Container.java:2109)
at java.awt.Component.dispatchEventImpl(Component.java:4460)
at java.awt.Container.dispatchEventImpl(Container.java:2167)
at java.awt.Component.dispatchEvent(Component.java:4286)
at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4465)
at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4129)
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4059)
at java.awt.Container.dispatchEventImpl(Container.java:2153)
at java.awt.Window.dispatchEventImpl(Window.java:2554)
at java.awt.Component.dispatchEvent(Component.java:4286)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:604)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:275)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:200)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:190)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:185)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:177)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:138)
从老师那里获得了网页的data(就是8000多个网页,分目录存储)
我的程序可以运行,在选取了分析目录和保存文件的目录后,运行,但是中间非常奇怪的有3个异常,
有3个文件无法分析,但是我调试的时候单独分析这3个文件又可以出结果,十分不解。希望各位达人抽空帮我看下,初学java真的不胜感激!
异常如下:
java.lang.ArrayIndexOutOfBoundsException: 85
at javax.swing.text.html.parser.ContentModel.first(ContentModel.java:177)
at javax.swing.text.html.parser.ContentModel.first(ContentModel.java:156)
at javax.swing.text.html.parser.ContentModelState.advance(ContentModelState.java:191)
at javax.swing.text.html.parser.ContentModelState.advance(ContentModelState.java:195)
at javax.swing.text.html.parser.TagStack.advance(TagStack.java:154)
at javax.swing.text.html.parser.Parser.legalElementContext(Parser.java:540)
at javax.swing.text.html.parser.Parser.legalTagContext(Parser.java:715)
at javax.swing.text.html.parser.Parser.parseTag(Parser.java:1930)
at javax.swing.text.html.parser.Parser.parseContent(Parser.java:2063)
at javax.swing.text.html.parser.Parser.parse(Parser.java:2230)
at javax.swing.text.html.parser.DocumentParser.parse(DocumentParser.java:122)
at javax.swing.text.html.parser.ParserDelegator.parse(ParserDelegator.java:90)
at html2test.Html2Text.parse(Html2Text.java:33)
at html2test.Html2Text.getfile(Html2Text.java:57)
at html2test.Html2Text.getfile(Html2Text.java:77)
at html2test.Html2Text.getfile(Html2Text.java:77)
at html2test.Html2Text.access$000(Html2Text.java:22)
at html2test.Html2Text$FileChooserDemo.actionPerformed(Html2Text.java:172)
at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2012)
at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2335)
at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:404)
at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:253)
at java.awt.Component.processMouseEvent(Component.java:6099)
at javax.swing.JComponent.processMouseEvent(JComponent.java:3287)
at java.awt.Component.processEvent(Component.java:5864)
at java.awt.Container.processEvent(Container.java:2109)
at java.awt.Component.dispatchEventImpl(Component.java:4460)
at java.awt.Container.dispatchEventImpl(Container.java:2167)
at java.awt.Component.dispatchEvent(Component.java:4286)
at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4465)
at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4129)
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4059)
at java.awt.Container.dispatchEventImpl(Container.java:2153)
at java.awt.Window.dispatchEventImpl(Window.java:2554)
at java.awt.Component.dispatchEvent(Component.java:4286)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:604)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:275)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:200)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:190)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:185)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:177)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:138)
import javax.swing.text.html.parser.*;
import java.io.*;
import java.awt.*;
import java.awt.event.*;
import javax.swing.*;
//import javax.swing.SwingUtilities;
//import javax.swing.filechooser.*;public class Html2Text extends HTMLEditorKit.ParserCallback {
StringBuffer s; static String input;
static String output;
public Html2Text() {}
public void parse(Reader in) throws IOException {
s = new StringBuffer();
ParserDelegator delegator = new ParserDelegator();
delegator.parse(in, this, Boolean.TRUE);
}
public void handleText(char[] text, int pos) {
s.append(text);
}
public String getText() {
return s.toString();
}
////
private static void getfile(String Dir,String Dirout)
{
File folder = new File(Dir);
File[] listOfFiles = folder.listFiles(); if(folder.listFiles()!=null){
// System.out.println("File #:"+listOfFiles.length);
for (int i = 0; i < listOfFiles.length; i++) {
if (listOfFiles[i].isFile()) {
// System.out.println("File #"+i+"(" + listOfFiles[i].getName());
////
try {
FileReader in = new FileReader(Dir+"/"+listOfFiles[i].getName());
Html2Text parser = new Html2Text();
// System.out.println("IN="+in);
parser.parse(in);
in.close();
// System.out.println(parser.getText());
try{
// Create file
FileWriter fstream = new FileWriter(Dirout+"/"+listOfFiles[i].getName()+".txt");
BufferedWriter out = new BufferedWriter(fstream);
out.write(parser.getText());
//Close the output stream
out.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}catch (Exception e) {
e.printStackTrace();
}
////
} else if (listOfFiles[i].isDirectory()) {
// System.out.println("Dir:"+Dir+"/"+listOfFiles[i].getName());
boolean success=(new File(Dirout+'/'+listOfFiles[i].getName())).mkdir();
getfile(Dir+
'/'+listOfFiles[i].getName(),
Dirout+'/'
+listOfFiles[i].getName());
}
}
}
else{
try {
FileReader in = new FileReader(Dir);
Html2Text parser = new Html2Text();
parser.parse(in);
in.close();
// System.out.println(parser.getText());
try{
// Create file
FileWriter fstream = new FileWriter(Dirout+"/"+folder.getName()+".txt");
BufferedWriter out = new BufferedWriter(fstream);
out.write(parser.getText());
//Close the output stream
out.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}catch (Exception e) {
e.printStackTrace();
}
}
}
////GUI
public static class FileChooserDemo extends JPanel
implements ActionListener {
static private final String newline = "\n";
JButton openButton, saveButton;
JTextArea log;
JFileChooser fc; public FileChooserDemo() {
super(new BorderLayout()); //Create the log first, because the action listeners
//need to refer to it.
log = new JTextArea(5,20);
log.setMargin(new Insets(5,5,5,5));
log.setEditable(false);
JScrollPane logScrollPane = new JScrollPane(log); //Create a file chooser
fc = new JFileChooser( );
fc.setFileSelectionMode(JFileChooser.FILES_AND_DIRECTORIES);
openButton = new JButton("Open a File...");
openButton.addActionListener(this); //Create the save button. We use the image from the JLF
//Graphics Repository (but we extracted it from the jar).
saveButton = new JButton("Save a File...");
saveButton.addActionListener(this); //For layout purposes, put the buttons in a separate panel
JPanel buttonPanel = new JPanel(); //use FlowLayout
buttonPanel.add(openButton);
buttonPanel.add(saveButton); //Add the buttons and the log to this panel.
add(buttonPanel, BorderLayout.PAGE_START);
add(logScrollPane, BorderLayout.CENTER);
} public void actionPerformed(ActionEvent e) {
//Handle open button action.
if (e.getSource() == openButton) {
int returnVal = fc.showOpenDialog(FileChooserDemo.this); if (returnVal == JFileChooser.APPROVE_OPTION) {
File file = fc.getSelectedFile();
//This is where a real application would open the file.
log.append("Opening: " + file.getAbsolutePath() + newline);
input=file.getAbsolutePath();
} else {
log.append("Open command cancelled by user." + newline);
}
log.setCaretPosition(log.getDocument().getLength()); //Handle save button action.
} else if (e.getSource() == saveButton) {
int returnVal = fc.showSaveDialog(FileChooserDemo.this);
log.append("Parsering... " +newline);
if (returnVal == JFileChooser.APPROVE_OPTION) {
File file = fc.getSelectedFile();
//This is where a real application would save the file.
log.append("Saving: " + file.getAbsolutePath() + newline);
output=file.getAbsolutePath();
getfile(input,output); } else {
log.append("Save command cancelled by user." + newline);
}
log.setCaretPosition(log.getDocument().getLength());
}
}
}
////
public static void main (String[] args) { JFrame frame = new JFrame("Html to Text");
frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); //Add content to the window.
frame.add(new FileChooserDemo()); //Display the window.
frame.pack();
frame.setVisible(true);
System.out.println("Done!");
}
}
这个错误是典型的数组越界的错误。
问题最大的可能性是出现在:
for (int i = 0; i < listOfFiles.length; i++) {
if (listOfFiles[i].isFile()) {
}else if (listOfFiles[i].isDirectory()) {
}
}你仔细调试下吧。
根据错误信息,出错的地方有一个是
第77行的
else if (listOfFiles[i].isDirectory()) {
// System.out.println("Dir:"+Dir+"/"+listOfFiles[i].getName());
boolean success=(new File(Dirout+'/'+listOfFiles[i].getName())).mkdir();
getfile(Dir+//这是77行
'/'+listOfFiles[i].getName(),
Dirout+'/'
+listOfFiles[i].getName());
}
这我就奇怪了,根据我前面的判断是否是目录,应该不可能进入这个语句的啊
具体的说,我测试的是单一目录,目录下有971个文件,异常出现在865的位置,应该不可能执行到else里的内容才对啊。。
根据错误信息,出错的地方有一个是
第77行的
else if (listOfFiles[i].isDirectory()) {
// System.out.println("Dir:"+Dir+"/"+listOfFiles[i].getName());
boolean success=(new File(Dirout+'/'+listOfFiles[i].getName())).mkdir();
getfile(Dir+//这是77行
'/'+listOfFiles[i].getName(),
Dirout+'/'
+listOfFiles[i].getName());
}
这我就奇怪了,根据我前面的判断是否是目录,应该不可能进入这个语句的啊
具体的说,我测试的是单一目录,目录下有971个文件,异常出现在865的位置,应该不可能执行到else里的内容才对啊。。
我上面贴的那个错误好像是测试8000多个有分目录时候出的异常,
我已经知道是哪个文件出的错了,于是就直接测那个目录的971个文件了,然后报错是下面这样的:java.lang.ArrayIndexOutOfBoundsException: 85
at javax.swing.text.html.parser.ContentModel.first(ContentModel.java:177)
at javax.swing.text.html.parser.ContentModel.first(ContentModel.java:156)
at javax.swing.text.html.parser.ContentModelState.advance(ContentModelState.java:191)
at javax.swing.text.html.parser.ContentModelState.advance(ContentModelState.java:195)
at javax.swing.text.html.parser.TagStack.advance(TagStack.java:154)
at javax.swing.text.html.parser.Parser.legalElementContext(Parser.java:540)
at javax.swing.text.html.parser.Parser.legalTagContext(Parser.java:715)
at javax.swing.text.html.parser.Parser.parseTag(Parser.java:1930)
at javax.swing.text.html.parser.Parser.parseContent(Parser.java:2063)
at javax.swing.text.html.parser.Parser.parse(Parser.java:2230)
at javax.swing.text.html.parser.DocumentParser.parse(DocumentParser.java:122)
at javax.swing.text.html.parser.ParserDelegator.parse(ParserDelegator.java:90)
at html2test.Html2Text.parse(Html2Text.java:33)
at html2test.Html2Text.getfile(Html2Text.java:57)
at html2test.Html2Text.access$000(Html2Text.java:22)
at html2test.Html2Text$FileChooserDemo.actionPerformed(Html2Text.java:172)
at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2012)
at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2335)
at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:404)
at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:253)
at java.awt.Component.processMouseEvent(Component.java:6099)
at javax.swing.JComponent.processMouseEvent(JComponent.java:3287)
at java.awt.Component.processEvent(Component.java:5864)
at java.awt.Container.processEvent(Container.java:2109)
at java.awt.Component.dispatchEventImpl(Component.java:4460)
at java.awt.Container.dispatchEventImpl(Container.java:2167)
at java.awt.Component.dispatchEvent(Component.java:4286)
at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4465)
at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4129)
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4059)
at java.awt.Container.dispatchEventImpl(Container.java:2153)
at java.awt.Window.dispatchEventImpl(Window.java:2554)
at java.awt.Component.dispatchEvent(Component.java:4286)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:604)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:275)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:200)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:190)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:185)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:177)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:138)