PDF汉字乱码问题,100分马上送!!

在java程序中,读出了pdf的内容,但,如果是汉字,就会乱码,谁能解决?100分马上送

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

4.抽取支持中文的pdf文件－xpdf
xpdf是一个开源项目，我们可以调用他的本地方法来实现抽取中文pdf文件。
下载xpdf函数包：http://www.matrix.org.cn/down_view.asp?id=15
同时需要下载支持中文的补丁包：http://www.matrix.org.cn/down_view.asp?id=16
按照readme放好中文的patch，就可以开始写调用本地方法的java程序了
下面是一个如何调用的例子：
import java.io.*;
/**
* Title: pdf extraction
* Description: email:[email protected]
* Copyright: Matrix Copyright (c) 2003
* Company: Matrix.org.cn
* @author chris
* @version 1.0,who use this example pls remain the declare
*/
public class PdfWin {
public PdfWin() {
}
public static void main(String args[]) throws Exception
{
String PATH_TO_XPDF="C:\\Program Files\\xpdf\\pdftotext.exe";
String filename="c:\\a.pdf";
String[] cmd = new String[] { PATH_TO_XPDF, "-enc", "UTF-8", "-q", filename, "-"};
Process p = Runtime.getRuntime().exec(cmd);
BufferedInputStream bis = new BufferedInputStream(p.getInputStream());
InputStreamReader reader = new InputStreamReader(bis, "UTF-8");
StringWriter out = new StringWriter();
char [] buf = new char[10000];
int len;
while((len = reader.read(buf))>= 0) {
//out.write(buf, 0, len);
System.out.println("the length is"+len);
}
reader.close();
String ts=new String(buf);
System.out.println("the str is"+ts);
}
}
仅供参考，不给你添麻烦就好了。
你没说你用了什么pdf的开源工具，如果是iText的话就是iTextAsian.jar缺少