关于UTF-8编码问题

我想请问一个问题：
chinese = new String(data,"UTF-8") 这样是没问题的但：
ByteArrayInputStream bais = new ByteArrayInputStream( data1 );
DataInputStream di = new DataInputStream( bais );
chinese = di.readUTF();
得出字符串不正确，为什么呢？注：1.data是byte数组，完全是没问题的。
2.data1前两位为data的长度，然后后面完全copy data。

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

我也遇到这个问题了，但是我是在实例化流的时候就指定了GBK的编码格式，解决问题的，至于为什么我也想知道。求解。
public static void print(byte[] arr){
for(int i=0;i<arr.length;i++){
if(i > 0 ){
System.out.print(",");
}
System.out.print(Integer.toHexString(arr[i]));
}
System.out.println();
}
public static void main(String[] args)throws Exception {
String str = "中国";
byte arr[] = str .getBytes("UTF-8");
print(arr);//ffffffe4,ffffffb8,ffffffad,ffffffe5,ffffff9b,ffffffbd
String str2 = new String(arr,"utf-8");
System.out.println(str2);//中国

ByteArrayOutputStream bos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(bos);
dos.writeUTF(str);
byte[] arr2 = bos.toByteArray();
print(arr2);//0,6,ffffffe4,ffffffb8,ffffffad,ffffffe5,ffffff9b,ffffffbd
}
//你会看到先用writeUTF写到ByteArrayOutputStream里面，字节数组会多出来两个字节，这两个字节是干嘛用的呢？他们俩就代表了utf字符串所占的字节数。因为一个字符在utf编码下所占的字节数是不一样的，//DataOutputStream源码：
for (int i = 0; i < strlen; i++) {
            c = str.charAt(i);
    if ((c >= 0x0001) && (c <= 0x007F)) {//一个字节
utflen++;
    } else if (c > 0x07FF) {//三个字节，一个汉字就占三个字节
utflen += 3;
    } else {//两个字节
utflen += 2;
    }
}
//因此，在读的时候，也是首先读出长度来：
//DataInputStream源码
public final static String readUTF(DataInput in) throws IOException {
        int utflen = in.readUnsignedShort();

public final int readUnsignedShort() throws IOException {
        int ch1 = in.read();
        int ch2 = in.read();
        if ((ch1 | ch2) < 0)
            throw new EOFException();
        return (ch1 << 8) + (ch2 << 0);
    }
byte[] b = new byte[5];
b[0] = 0;
b[1] = 3;
b[2] = 97;
b[3] = 98;
b[4] = 99;
ByteArrayInputStream bais = new ByteArrayInputStream(b);
DataInputStream dis = new DataInputStream(bais);
System.out.println(dis.readUTF());没问题呀，你确定你那字节数组是对的？