{
String test = "放苹果\",";
String tt = new String(test.getBytes("utf-8"));
System.out.println(new String(tt.getBytes(),"utf-8"));
}
输出:放苹??,
String test = "放苹果\",";
String tt = new String(test.getBytes("utf-8"));
System.out.println(new String(tt.getBytes(),"utf-8"));
}
输出:放苹??,
String tt = new String(test.getBytes("GKB"));
System.out.println(new String(tt.getBytes(),"GBK"));
因为unicode 转 utf-8时中文是3个字节,那里 3*3 = 9个字节
但你再次解码为String的时候输出是char,char是两个字节,哪么9%2不能整除。所以后面部分乱码
如果是偶数的中文字utf-8显示就没有问题,不信你可以试试
所以需要使用
page.getHTTP().getBody().getBytes("gbk"),"utf8")
获得中文内容,但是最后一个奇数字符一定是乱码.
有什么方法可以避免这种情况?
unicode 就是传说中的iso-8859-1
我觉得必须得转成GBK的,记得有个函数可以转的,好久没看j2ee了,忘记了5楼这位大侠可以帮你的
public static byte[] gbk2utf8(String chenese) {
char c[] = chenese.toCharArray();
byte[] fullByte = new byte[3 * c.length];
for (int i = 0; i < c.length; i++) {
int m = (int) c[i];
String word = Integer.toBinaryString(m);
//System.out.println(word); StringBuffer sb = new StringBuffer();
int len = 16 - word.length();
// 补零
for (int j = 0; j < len; j++) {
sb.append("0");
}
sb.append(word);
sb.insert(0, "1110");
sb.insert(8, "10");
sb.insert(16, "10"); //System.out.println(sb.toString()); String s1 = sb.substring(0, 8);
String s2 = sb.substring(8, 16);
String s3 = sb.substring(16); byte b0 = Integer.valueOf(s1, 2).byteValue();
byte b1 = Integer.valueOf(s2, 2).byteValue();
byte b2 = Integer.valueOf(s3, 2).byteValue();
byte[] bf = new byte[3];
bf[0] = b0;
fullByte[i * 3] = bf[0];
bf[1] = b1;
fullByte[i * 3 + 1] = bf[1];
bf[2] = b2;
fullByte[i * 3 + 2] = bf[2]; }
return fullByte;
}