字符编码高手来~~~~~~~~~~~在线等

直接上代码！
import java.net.*;
public class Decode {
public static void main(String[] args)throws Exception{
String str = "大海猪";
String a = URLEncoder.encode(str, "UTF-8");
System.out.println("a="+a);
String b = URLDecoder.decode(a,"UTF-8");
System.out.println("b="+b);
String c = new String (b.getBytes("utf-8"),"GBK");
System.out.println("c="+c);
String d = new String (c.getBytes("GBK"),"utf-8");
System.out.println("d="+d);
String e = new String (c.getBytes("utf-8"),"GBK");
System.out.println("e="+e);
}
}
当str是基位数的中文时转换不出来，当str是偶数的中文时转换没问题，麻烦字符编码高手帮忙解答一下，顺便给个方案，谢谢！

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

原因好像出在New String上面。你把后面那些new String 全部用URLEncoder.encode或URLEncoder.decode可以吗？
这是UTF-8编码中的一个BUG，期待修复吧
方案是通过ISO-8859-1码或unicode作为中间值,可以互转.因为utf-8的中文是3个字节的.而gbk中文是2个字节,三个字的utf-8转为gbk的时候,第9个字节单独转不了汉字,却又不是单字节字符,具体原因也不知道,上述最后一个字节转换后本来为101010,转换后变为111111.字节已经改变.所以再转回来的时候,最后一个字乱码了
是编码本身对字符占用字节数的问题。
System.out.println(getByteLengthByEncoding("大","UTF-8"));
System.out.println(getByteLengthByEncoding("大","GBK"));public static int getByteLengthByEncoding(String str, String encoding){
try {
return str.getBytes(encoding).length;
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
throw new RuntimeException();
}
}汉字在UTF-8中占3个字节，在GBK中占2个字节;看看下面：String str = "大海猪";displayCharsetEncodingByte(str,"UTF-8");
displayCharsetEncodingByte(str,"GBK");public static void displayCharsetEncodingByte(String str,String encoding){
try {
byte[] byteArr = str.getBytes(encoding);
for(byte b : byteArr){
System.out.print(b);
System.out.print(" ");
}
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
System.out.println();
}采用不同的编码会得到不同的字节数和字节；当你用这步String c = new String (b.getBytes("utf-8"),"GBK");解码时，最后一个字节抛弃了；
因为最后一个字节GBK编码转换不了；到String d = new String (c.getBytes("GBK"),"utf-8");这步时能显示2个汉字，后面的两个字节因为上面一步抛弃了一个字节，UTF-8编码无法转换解码，所以抛弃了后两个字节;到String e = new String (d.getBytes("utf-8"),"GBK");这步时,所有字节在GBK中能解码；后面就没问题了，因为不会再有字节抛弃；如果GBK能解码UTF-8最后一个字节，就不会出现问题，注意乱码也是字符，是系统字符集中存在的，只要这种编码支持这些字符；下面是全部测试：import java.io.UnsupportedEncodingException;public class DecodeDemo { public static void main(String[] args) throws UnsupportedEncodingException {
String str = "大海猪";
System.out.println(getByteLengthByEncoding("大","UTF-8"));
        System.out.println(getByteLengthByEncoding("大","GBK"));
System.out.println();

        displayDECUnicode(str);
        displayHEXUnicode(str);
        displayCharsetEncodingByte(str,"UTF-8");
        displayCharsetEncodingByte(str,"GBK");
        System.out.println();

        displayCharsetEncodingByte(str,"UTF-8");
        String c = new String (str.getBytes("utf-8"),"GBK");
        System.out.println("c="+c);
        displayDECUnicode(c);
        System.out.println();

        displayCharsetEncodingByte(c,"GBK");
        String d = new String (c.getBytes("GBK"),"utf-8");
        System.out.println("d="+d);
        displayDECUnicode(d);
        System.out.println();

        displayCharsetEncodingByte(d,"UTF-8");
        String e = new String (d.getBytes("utf-8"),"GBK");
        System.out.println("e="+e);
        displayDECUnicode(e);
        System.out.println(); }

public static void displayDECUnicode(String str){
char[] charStr = str.toCharArray();
for(char c : charStr){
System.out.print((int)c);
System.out.print(" ");
}
System.out.println();
}

public static void displayHEXUnicode(String str){
char[] charStr = str.toCharArray();
for(char c : charStr){
System.out.print("\\u" + Integer.toHexString(c).toUpperCase());
System.out.print(" ");
}
System.out.println();
}

public static void displayCharsetEncodingByte(String str,String encoding){
try {
byte[] byteArr = str.getBytes(encoding);
for(byte b : byteArr){
System.out.print(b);
System.out.print(" ");
}
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
System.out.println();
}

public static int getByteLengthByEncoding(String str, String encoding){
try {
return str.getBytes(encoding).length;
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
throw new RuntimeException();
}
}
}
当str是基位数的中文时转换不出来，当str是偶数的中文时转换没问题
把"大海猪"改成"大海a";看下是不是可以了。
C:\Documents and Settings\wfgp83\Desktop>java Test
a=%C2%B4%C3%B3%C2%BA%C2%A3%C3%96%C3%AD
b=大海猪
c=??????
d=大海猪
e=?????????C:\Documents and Settings\wfgp83\Desktop>javac Test.javaC:\Documents and Settings\wfgp83\Desktop>java Test
a=%C2%B4%C3%B3%C2%BA%C2%A3%C3%96%C3%AD%C3%90%C2%A1
b=大海猪小
c=????????
d=大海猪小
e=????????????在我机子上测没问题哟，JDK1.6.23
又见字符集问题，看着楼主的问题让我想起我开始了解字符集时，貌似也做过同样的测试  ：> 怀恋下。回答如下：
String str = "大海猪";  是java代码，即src。jvm运行时是load class文件，并将class中的中文装入jvm，在jvm中统一使用unicode来对字符进行编码，这就是为什么java支持所有国家的字符。
"大海猪"编译后，进入class是它的字节流，该流即java文件的字节流，class编译时获取了java文件的编码方式，所以jvm才能获得到正确的“大海猪”。以下的操作都是在jvm load到字符后进行的操作。建议楼主去了解下下面的两种行为。
URLEncoder.encode(str, "UTF-8");
new String (b.getBytes("utf-8"),"GBK");最好能在了解字节流，字符集，char，byte后，查看下String的API的内部实现。
例如:从这些APIString getBytes(), new String(bytes,"xxx")看下去，了解下String到底给我们提供了什么：>  String 功能为 Java 提供了字符和 byte的驱动能力，driver的种类和字符集的分类相关。话就不多说了。