POI问题

大家好，最近要用POI，以前从没用过，对一些问题不懂，在此向大家请教：
1、POI能读取任意格式的文件吗？
2、我们单位有种数据文件，与微软的word的组织结构类似，能用POI读取吗？
3、如果能读取，有没有示例程序啊
谢谢大家了

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

POI是Apache的Jakata项目，POI 代表 Poor Obfuscation Implementation，即不良模糊化实现。POI 的目标就是提供一组 Java API 来使得基于 Microsoft OLE 2 Compound Document 格式的 Microsoft Office 文件易于操作。一些 POI API 仅仅是为最常用的 Microsoft Office 文件 Word 和 Excel 而开发的；而其他的 API 则是用于通用的 OLE 2 Compound Document 和属性文件。public class WordExtractor {
public WordExtractor() {
} public String extractText(InputStream in) throws IOException {
ArrayList text = new ArrayList();
POIFSFileSystem fsys = new POIFSFileSystem(in); DocumentEntry headerProps = (DocumentEntry) fsys.getRoot().getEntry("WordDocument");
DocumentInputStream din = fsys.createDocumentInputStream("WordDocument");
byte[] header = new byte[headerProps.getSize()]; din.read(header);
din.close();
// Prende le informazioni dall'header del documento
int info = LittleEndian.getShort(header, 0xa); boolean useTable1 = (info & 0x200) != 0; //boolean useTable1 = true;

// Prende informazioni dalla piece table
int complexOffset = LittleEndian.getInt(header, 0x1a2);
//int complexOffset = LittleEndian.getInt(header);

String tableName = null;
if (useTable1) {
tableName = "1Table";
} else {
tableName = "0Table";
} DocumentEntry table = (DocumentEntry) fsys.getRoot().getEntry(tableName);
byte[] tableStream = new byte[table.getSize()]; din = fsys.createDocumentInputStream(tableName); din.read(tableStream);
din.close(); din = null;
fsys = null;
table = null;
headerProps = null; int multiple = findText(tableStream, complexOffset, text); StringBuffer sb = new StringBuffer();
int size = text.size();
tableStream = null; for (int x = 0; x < size; x++) {

WordTextPiece nextPiece = (WordTextPiece) text.get(x);
int start = nextPiece.getStart();
int length = nextPiece.getLength(); boolean unicode = nextPiece.usesUnicode();
String toStr = null;
if (unicode) {
toStr = new String(header, start, length * multiple, "UTF-16LE");
} else {
toStr = new String(header, start, length, "ISO-8859-1");
}
sb.append(toStr).append(" "); }
return sb.toString();
} private static int findText(byte[] tableStream, int complexOffset, ArrayList text)
throws IOException {
//actual text
int pos = complexOffset;
int multiple = 2;
//skips through the prms before we reach the piece table. These contain data
//for actual fast saved files
while (tableStream[pos] == 1) {
pos++;
int skip = LittleEndian.getShort(tableStream, pos);
pos += 2 + skip;
}
if (tableStream[pos] != 2) {
throw new IOException("corrupted Word file");
} else {
//parse out the text pieces
int pieceTableSize = LittleEndian.getInt(tableStream, ++pos);
pos += 4;
int pieces = (pieceTableSize - 4) / 12;
for (int x = 0; x < pieces; x++) {
int filePos =
LittleEndian.getInt(tableStream, pos + ((pieces + 1) * 4) + (x * 8) + 2);
boolean unicode = false;
if ((filePos & 0x40000000) == 0) {
unicode = true;
} else {
unicode = false;
multiple = 1;
filePos &= ~(0x40000000); //gives me FC in doc stream
filePos /= 2;
}
int totLength =
LittleEndian.getInt(tableStream, pos + (x + 1) * 4)
- LittleEndian.getInt(tableStream, pos + (x * 4)); WordTextPiece piece = new WordTextPiece(filePos, totLength, unicode);
text.add(piece); } }
return multiple;
}
public static void main(String[] args){
WordExtractor w  = new WordExtractor();
POIFSFileSystem ps = new POIFSFileSystem();
try{

File file = new File("C:\\test.doc");

InputStream in = new FileInputStream(file);
String s = w.extractText(in);
System.out.println(s);

}catch(Exception e){
e.printStackTrace();
}

}}
class WordTextPiece {
private int _fcStart;
private boolean _usesUnicode;
private int _length; public WordTextPiece(int start, int length, boolean unicode) {
_usesUnicode = unicode;
_length = length;
_fcStart = start;
}
public boolean usesUnicode() {
return _usesUnicode;
} public int getStart() {
return _fcStart;
}
public int getLength() {
return _length;
}}
需要的jar
poi-3.0.1.jar
一．POI简介 Jakarta POI 是apache的子项目，目标是处理ole2对象。它提供了一组操纵Windows文档的Java API 目前比较成熟的是HSSF接口，处理MS Excel（97-2002）对象。它不象我们仅仅是用csv生成的没有格式的可以由Excel转换的东西，而是真正的Excel对象，你可以控制一些属性如sheet,cell等等。二．HSSF概况 HSSF 是Horrible SpreadSheet Format的缩写，也即“讨厌的电子表格格式”。也许HSSF的名字有点滑稽，就本质而言它是一个非常严肃、正规的API。通过HSSF，你可以用纯Java代码来读取、写入、修改Excel文件。 HSSF 为读取操作提供了两类API：usermodel和eventusermodel，即“用户模型”和“事件-用户模型”。前者很好理解，后者比较抽象，但操作效率要高得多。三．开始编码 1 ．准备工作要求:JDK 1.4+POI开发包可以到 http://www.apache.org/dyn/closer.cgi/jakarta/poi/ 最新的POI工具包 2 ． EXCEL 结构 HSSFWorkbook excell 文档对象介绍
HSSFSheet excell的表单
HSSFRow excell的行
HSSFCell excell的格子单元
HSSFFont excell字体
HSSFName 名称
HSSFDataFormat 日期格式
在poi1.7中才有以下2项：
HSSFHeader sheet头
HSSFFooter sheet尾
和这个样式
HSSFCellStyle cell样式
辅助操作包括
HSSFDateUtil 日期
HSSFPrintSetup 打印
HSSFErrorConstants 错误信息表 3 ．具体用法实例（采用 usermodel ）如何读Excel 读取Excel文件时，首先生成一个POIFSFileSystem对象，由POIFSFileSystem对象构造一个HSSFWorkbook，该HSSFWorkbook对象就代表了Excel文档。下面代码读取上面生成的Excel文件写入的消息字串：
Java代码
POIFSFileSystem fs=newPOIFSFileSystem(new FileInputStream("d:\test.xls"));
HSSFWorkbook 　wb = new HSSFWorkbook(fs);
　　} catch (IOException e) {
　　e.printStackTrace();
　　}
　　HSSFSheet sheet = wb.getSheetAt(0);
　　HSSFRow row = sheet.getRow(0);
　　HSSFCell cell = row.getCell((short) 0);
　　String msg = cell.getStringCellValue();   POIFSFileSystem fs=newPOIFSFileSystem(new FileInputStream("d:\test.xls"));
HSSFWorkbook 　wb = new HSSFWorkbook(fs);
　　} catch (IOException e) {
　　e.printStackTrace();
　　}
　　HSSFSheet sheet = wb.getSheetAt(0);
　　HSSFRow row = sheet.getRow(0);
　　HSSFCell cell = row.getCell((short) 0);
　　String msg = cell.getStringCellValue(); 如何写excel，将excel的第一个表单第一行的第一个单元格的值写成“a test”。 Java代码
POIFSFileSystem fs =new POIFSFileSystem(new FileInputStream("workbook.xls"));

    HSSFWorkbook wb = new HSSFWorkbook(fs);

    HSSFSheet sheet = wb.getSheetAt(0);

    HSSFRow row = sheet.getRow(0);

    HSSFCell cell = row.getCell((short)0);

    cell.setCellValue("a test");

    // Write the output to a file

    FileOutputStream fileOut = new FileOutputStream("workbook.xls");

    wb.write(fileOut);

fileOut.close();   POIFSFileSystem fs =new POIFSFileSystem(new FileInputStream("workbook.xls"));     HSSFWorkbook wb = new HSSFWorkbook(fs);     HSSFSheet sheet = wb.getSheetAt(0);     HSSFRow row = sheet.getRow(0);     HSSFCell cell = row.getCell((short)0);     cell.setCellValue("a test");     // Write the output to a file     FileOutputStream fileOut = new FileOutputStream("workbook.xls");     wb.write(fileOut); fileOut.close();
4 ．可参考文档 POI 主页：http://jakarta.apache.org/poi/，初学者如何快速上手使用POI HSSF http://jakarta.apache.org/poi/hssf/quick-guide.html 。代码例子 http://blog.java-cn.com/user1/6749/archives/2005/18347.html
里面有很多例子代码，可以很方便上手。四．使用心得 POI HSSF 的usermodel包把Excel文件映射成我们熟悉的结构，诸如Workbook、Sheet、Row、Cell等，它把整个结构以一组对象的形式保存在内存之中，便于理解，操作方便，基本上能够满足我们的要求，所以说这个一个不错的选择。