我想请问一个问题:
chinese = new String(data,"UTF-8") 这样是没问题的但:
ByteArrayInputStream bais = new ByteArrayInputStream( data1 );
DataInputStream di = new DataInputStream( bais );
chinese = di.readUTF();
得出字符串不正确,为什么呢?注:1.data是byte数组,完全是没问题的。
2.data1前两位为data的长度,然后后面完全copy data。
chinese = new String(data,"UTF-8") 这样是没问题的但:
ByteArrayInputStream bais = new ByteArrayInputStream( data1 );
DataInputStream di = new DataInputStream( bais );
chinese = di.readUTF();
得出字符串不正确,为什么呢?注:1.data是byte数组,完全是没问题的。
2.data1前两位为data的长度,然后后面完全copy data。
public static void print(byte[] arr){
for(int i=0;i<arr.length;i++){
if(i > 0 ){
System.out.print(",");
}
System.out.print(Integer.toHexString(arr[i]));
}
System.out.println();
}
public static void main(String[] args)throws Exception {
String str = "中国";
byte arr[] = str .getBytes("UTF-8");
print(arr);//ffffffe4,ffffffb8,ffffffad,ffffffe5,ffffff9b,ffffffbd
String str2 = new String(arr,"utf-8");
System.out.println(str2);//中国
ByteArrayOutputStream bos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(bos);
dos.writeUTF(str);
byte[] arr2 = bos.toByteArray();
print(arr2);//0,6,ffffffe4,ffffffb8,ffffffad,ffffffe5,ffffff9b,ffffffbd
}
//你会看到先用writeUTF写到ByteArrayOutputStream里面,字节数组会多出来两个字节,这两个字节是干嘛用的呢?他们俩就代表了utf字符串所占的字节数。因为一个字符在utf编码下所占的字节数是不一样的,//DataOutputStream源码:
for (int i = 0; i < strlen; i++) {
c = str.charAt(i);
if ((c >= 0x0001) && (c <= 0x007F)) {//一个字节
utflen++;
} else if (c > 0x07FF) {//三个字节,一个汉字就占三个字节
utflen += 3;
} else {//两个字节
utflen += 2;
}
}
//因此,在读的时候,也是首先读出长度来:
//DataInputStream源码
public final static String readUTF(DataInput in) throws IOException {
int utflen = in.readUnsignedShort();
public final int readUnsignedShort() throws IOException {
int ch1 = in.read();
int ch2 = in.read();
if ((ch1 | ch2) < 0)
throw new EOFException();
return (ch1 << 8) + (ch2 << 0);
}
byte[] b = new byte[5];
b[0] = 0;
b[1] = 3;
b[2] = 97;
b[3] = 98;
b[4] = 99;
ByteArrayInputStream bais = new ByteArrayInputStream(b);
DataInputStream dis = new DataInputStream(bais);
System.out.println(dis.readUTF());没问题呀,你确定你那字节数组是对的?