关于utf8的疑问

写了一段代码，实现从web服务器中读出一段UTF8编码的文件
相干代码：
while ((str = urlReader.readLine()) != null) {
//读取str
}
resultText.setText(new String(str.toString().getBytes(),
"UTF8"));注：resultText为SWT中的Text，str是从web服务器中读出来的数据流，编码为UTF-8/*******************/同一文件内的英文没有问题，可是中文却乱码，而且多出在标点符号后。等待高人解围~~错误实例：（resultText中复制过来的）??般来??,压缩档案不应包含??有档案压缩目录下,例如 Java语言文字档案和档案卷宗应排除.

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

把"UTF8" 去掉,默认试试
这个不行~那就不是只有几个乱码的问题了,基本上没有不乱码的了做一下字符转换应该可以！到网上找找
标准库函数里有的,为什么去网上找?谢谢~
resultText.setText(new String(str.toString().getBytes("GBK"),
"UTF8"));试下.
resultText.setText(new String(str.toString().getBytes("GBK"),
"UTF-8"));
> resultText.setText(new String(str.toString().getBytes(), "UTF8"));这行程序的用法是不对的。这么做，好的字符串也会给搞乱码了。其实只要 resultText.setText(str) 就 OK 了。如果有乱码的话，问题应该出在 urlReader.readLine() 上，从那里出来的 String 就已经是乱码了。
说一个概念问题吧，“str是从web服务器中读出来的数据流，编码为UTF-8”这种说法在概念上是不清楚的。只能说，“web服务器在发送数据流的时候采用的是 UTF-8 编码”，而经过 urlReader.readLine() 得到的 str 是一个 Java 的 String 对象，是没有“编码”一说的。而把“数据流”转化为一个 String 对象，需要正确处理编码问题，所以，如果出现乱码，一定是出在 urlReader.readLine() 里面。
下面是引自openJDK javac1.7 com.sun.tools.javac.util.Convert类的源码:    /** Convert `len' bytes from utf8 to characters.
     *  Parameters are as in System.arraycopy
     *  Return first index in `dst' past the last copied char.
     *  @param src        The array holding the bytes to convert.
     *  @param sindex     The start index from which bytes are converted.
     *  @param dst        The array holding the converted characters..
     *  @param dindex     The start index from which converted characters
     *                    are written.
     *  @param len        The maximum number of bytes to convert.
     */
    public static int utf2chars(byte[] src, int sindex,
                                char[] dst, int dindex,
                                int len) {
        int i = sindex;
        int j = dindex;
        int limit = sindex + len;
        while (i < limit) {
            int b = src[i++] & 0xFF;
            if (b >= 0xE0) {
                b = (b & 0x0F) << 12;
                b = b | (src[i++] & 0x3F) << 6;
                b = b | (src[i++] & 0x3F);
            } else if (b >= 0xC0) {
                b = (b & 0x1F) << 6;
                b = b | (src[i++] & 0x3F);
            }
            dst[j++] = (char)b;
        }
        return j;
    }    /** Return bytes in Utf8 representation as an array of characters.
     *  @param src        The array holding the bytes.
     *  @param sindex     The start index from which bytes are converted.
     *  @param len        The maximum number of bytes to convert.
     */
    public static char[] utf2chars(byte[] src, int sindex, int len) {
        char[] dst = new char[len];
        int len1 = utf2chars(src, sindex, dst, 0, len);
        char[] result = new char[len1];
        System.arraycopy(dst, 0, result, 0, len1);
        return result;
    }    /** Return all bytes of a given array in Utf8 representation
     *  as an array of characters.
     *  @param src        The array holding the bytes.
     */
    public static char[] utf2chars(byte[] src) {
        return utf2chars(src, 0, src.length);
    }    /** Return bytes in Utf8 representation as a string.
     *  @param src        The array holding the bytes.
     *  @param sindex     The start index from which bytes are converted.
     *  @param len        The maximum number of bytes to convert.
     */
    public static String utf2string(byte[] src, int sindex, int len) {
        char dst[] = new char[len];
        int len1 = utf2chars(src, sindex, dst, 0, len);
        return new String(dst, 0, len1);
    }    /** Return all bytes of a given array in Utf8 representation
     *  as a string.
     *  @param src        The array holding the bytes.
     */
    public static String utf2string(byte[] src) {
        return utf2string(src, 0, src.length);
    }
To: maquan
只能说，“web服务器在发送数据流的时候采用的是 UTF-8 编码”，而经过 urlReader.readLine() 得到的 str 是一个 Java 的 String 对象，是没有“编码”一说的。
java内部使用unicode，那么这个String对象也应该是unicode保存的吧~
KRplusSRequalGOD(狂人+善人=神)也谢谢你贴的源代码了~
> java内部使用unicode，那么这个String对象也应该是unicode保存的吧~Java 内部使用的存储采用的编码方式是 UCS-2，但这个事实对于程序员来说是透明的，它只是 JVM 的内部实现细节，对于 Java 程序员来说不必知道。顺便再说一个概念问题，hehe  :Dunicode 是一种字符集（character set），支持这种字符集的编码方式（encoding）有 UCS-2、UTF-8、UTF-16 等。字符集和编码方式是两个不同的东西，虽然有时候被混为一谈（比如 GB2312）。
这个问题需要分两步解决
1 确认你读取的文件的编码方式
2 确认你显示平台上支持的编码方式
比如源文件的编码方式GB2312，需要在支持UTF-8的平台上显示，步骤如下
char gb2312Data = s.getBytes("GB2312");
String newString = new String(gb2312Data,"UTF-8");