Java 统一字符编码问题急

有一批文件，文件编码方式有gbk，有gb2312,有utf-8,打算统一编码为utf-8.因此想用java编写一个程序读入指定文件夹的文件，然后将其转换为utf-8格式。
自己试着写了一个：
private static void transferFile(File file) throws IOException {
       String line_separator = System.getProperty("line.separator");
       FileInputStream fis = new FileInputStream(file);
       StringBuffer content = new StringBuffer();
       DataInputStream in = new DataInputStream(fis);
       BufferedReader d = new BufferedReader(new InputStreamReader(in, "GBK"));
       String line = null;
       while ((line = d.readLine()) != null)
        content.append(line + line_separator);
       d.close();
       in.close();
       fis.close();

       Writer ow = new OutputStreamWriter(new FileOutputStream(file), "utf-8");
       ow.write(content.toString());
       ow.close();
     }
但是这种方法必须已知文件的编码方式才能处理。因为是批量处理，所以事先不知文件的编码方式，请问应采用何种方法来实现啊。这个问题困扰我很久了，谢谢大家了。

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

比如一个字节数组  byte[] bs String s=new String(bs,"GBK");
byte[] bs2=s.getBytes("GBK");
if(Arrays.equals(bs,bs2)){
//非常大的可能是GBK编码
}原理是: 不正确的编码转换会对信息造成不可逆的修改
另外: GBK是包含GB2312的
楼上，能把您的思路用代码实现一下吗，我java不怎么熟悉，这个急着要用，谢谢了。
通过文件头部的标识可以参考cpdetector项目
http://cpdetector.sourceforge.net/
public static byte[] toUTF8(byte[] bs){
try{
String s=new String(bs,"GBK");
byte[] bs2=s.getBytes("GBK");
if(Arrays.equals(bs,bs2)){
return s.getBytes("UTF-8");
}
}
catch(Exception ex){
}
return bs;
}
因为单纯看文件的字节编码根本看不出来是什么字符集的，所以这个问题是比较头痛。4楼是个办法。还有，如果知道文件内容中，有典型汉字、词、短语，在各文档中普遍出现，则拿这些短语的字节码（GBK的和UTF-8的）分别到文件中去匹配，应该可以比较快地识别出文档编码来。

Java 统一字符编码问题 急

解决方案 »

Java 统一字符编码问题急