直接上代码!
import java.net.*;
public class Decode {
public static void main(String[] args)throws Exception{
String str = "大海猪";
String a = URLEncoder.encode(str, "UTF-8");
System.out.println("a="+a);
String b = URLDecoder.decode(a,"UTF-8");
System.out.println("b="+b);
String c = new String (b.getBytes("utf-8"),"GBK");
System.out.println("c="+c);
String d = new String (c.getBytes("GBK"),"utf-8");
System.out.println("d="+d);
String e = new String (c.getBytes("utf-8"),"GBK");
System.out.println("e="+e);
}
}
当str是基位数的中文时转换不出来,当str是偶数的中文时转换没问题,麻烦字符编码高手帮忙解答一下,顺便给个方案,谢谢!
import java.net.*;
public class Decode {
public static void main(String[] args)throws Exception{
String str = "大海猪";
String a = URLEncoder.encode(str, "UTF-8");
System.out.println("a="+a);
String b = URLDecoder.decode(a,"UTF-8");
System.out.println("b="+b);
String c = new String (b.getBytes("utf-8"),"GBK");
System.out.println("c="+c);
String d = new String (c.getBytes("GBK"),"utf-8");
System.out.println("d="+d);
String e = new String (c.getBytes("utf-8"),"GBK");
System.out.println("e="+e);
}
}
当str是基位数的中文时转换不出来,当str是偶数的中文时转换没问题,麻烦字符编码高手帮忙解答一下,顺便给个方案,谢谢!
System.out.println(getByteLengthByEncoding("大","UTF-8"));
System.out.println(getByteLengthByEncoding("大","GBK"));public static int getByteLengthByEncoding(String str, String encoding){
try {
return str.getBytes(encoding).length;
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
throw new RuntimeException();
}
}汉字在UTF-8中占3个字节,在GBK中占2个字节;看看下面:String str = "大海猪";displayCharsetEncodingByte(str,"UTF-8");
displayCharsetEncodingByte(str,"GBK");public static void displayCharsetEncodingByte(String str,String encoding){
try {
byte[] byteArr = str.getBytes(encoding);
for(byte b : byteArr){
System.out.print(b);
System.out.print(" ");
}
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
System.out.println();
}采用不同的编码会得到不同的字节数和字节;当你用这步String c = new String (b.getBytes("utf-8"),"GBK");解码时,最后一个字节抛弃了;
因为最后一个字节GBK编码转换不了;到String d = new String (c.getBytes("GBK"),"utf-8");这步时能显示2个汉字,后面的两个字节因为上面一步抛弃了一个字节,UTF-8编码无法转换解码,所以抛弃了后两个字节;到String e = new String (d.getBytes("utf-8"),"GBK");这步时,所有字节在GBK中能解码;后面就没问题了,因为不会再有字节抛弃;如果GBK能解码UTF-8最后一个字节,就不会出现问题,注意乱码也是字符,是系统字符集中存在的,只要这种编码支持这些字符;下面是全部测试:import java.io.UnsupportedEncodingException;public class DecodeDemo { public static void main(String[] args) throws UnsupportedEncodingException {
String str = "大海猪";
System.out.println(getByteLengthByEncoding("大","UTF-8"));
System.out.println(getByteLengthByEncoding("大","GBK"));
System.out.println();
displayDECUnicode(str);
displayHEXUnicode(str);
displayCharsetEncodingByte(str,"UTF-8");
displayCharsetEncodingByte(str,"GBK");
System.out.println();
displayCharsetEncodingByte(str,"UTF-8");
String c = new String (str.getBytes("utf-8"),"GBK");
System.out.println("c="+c);
displayDECUnicode(c);
System.out.println();
displayCharsetEncodingByte(c,"GBK");
String d = new String (c.getBytes("GBK"),"utf-8");
System.out.println("d="+d);
displayDECUnicode(d);
System.out.println();
displayCharsetEncodingByte(d,"UTF-8");
String e = new String (d.getBytes("utf-8"),"GBK");
System.out.println("e="+e);
displayDECUnicode(e);
System.out.println(); }
public static void displayDECUnicode(String str){
char[] charStr = str.toCharArray();
for(char c : charStr){
System.out.print((int)c);
System.out.print(" ");
}
System.out.println();
}
public static void displayHEXUnicode(String str){
char[] charStr = str.toCharArray();
for(char c : charStr){
System.out.print("\\u" + Integer.toHexString(c).toUpperCase());
System.out.print(" ");
}
System.out.println();
}
public static void displayCharsetEncodingByte(String str,String encoding){
try {
byte[] byteArr = str.getBytes(encoding);
for(byte b : byteArr){
System.out.print(b);
System.out.print(" ");
}
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
System.out.println();
}
public static int getByteLengthByEncoding(String str, String encoding){
try {
return str.getBytes(encoding).length;
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
throw new RuntimeException();
}
}
}
把"大海猪"改成"大海a";看下是不是可以了。
a=%C2%B4%C3%B3%C2%BA%C2%A3%C3%96%C3%AD
b=大海猪
c=??????
d=大海猪
e=?????????C:\Documents and Settings\wfgp83\Desktop>javac Test.javaC:\Documents and Settings\wfgp83\Desktop>java Test
a=%C2%B4%C3%B3%C2%BA%C2%A3%C3%96%C3%AD%C3%90%C2%A1
b=大海猪小
c=????????
d=大海猪小
e=????????????在我机子上测没问题哟,JDK1.6.23
String str = "大海猪"; 是java代码,即src。jvm运行时是load class文件,并将class中的中文装入jvm,在jvm中统一使用unicode来对字符进行编码,这就是为什么java支持所有国家的字符。
"大海猪"编译后,进入class是它的字节流,该流即java文件的字节流,class编译时获取了java文件的编码方式,所以jvm才能获得到正确的“大海猪”。以下的操作都是在jvm load到字符后进行的操作。建议楼主去了解下下面的两种行为。
URLEncoder.encode(str, "UTF-8");
new String (b.getBytes("utf-8"),"GBK");最好能在了解 字节流,字符集,char,byte后,查看下String的API的内部实现。
例如:从这些APIString getBytes(), new String(bytes,"xxx")看下去,了解下String到底给我们提供了什么:> String 功能为 Java 提供了字符 和 byte的驱动能力,driver的种类和字符集的分类相关。话就不多说了。