现在Java的string底层到底用的UTF8还是UTF16?

thanks

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

估计是UTF-8，最近用JDOM，它产生的XML文件默认就是UTF-8的
应该是UTF-16，如果UTF-8，何必要char是2个byte呢
应该存储为UTF-8.
这是为了照顾英文等.一则,JAVA是他们创立的,首先要考虑到自己再考虑他人,再则,大多网络文献是用英文写的,如此可以节约空间.
但对中日韩等则是增加了空间浪费,因为CJK中的字符如存UTF-8其空间平均是UTF-16的1.5倍.
会不会根据安装的操作系统的字符集来决定使用UTF8或者说UTF16呢?
String使用的是系统缺省的编码方式，比如说我的电脑上(中文XP)就是GBK。
可以调用java.nio.charset.defaultCharset();来查看缺省使用的编码方式，如果想得到其他方式的编码，可以调用String类的byte[] getBytes(String charsetName);
string的运行的编码是操作系统缺省编码,但必最终储为UTF-8.
String 是CHAR数组应该是UTF-8
Unicode is a relatively inefficient encoding when most of your text consists of ASCII
characters. Every character requires the same number of bytes—two—even though some
characters are used much more frequently than others. A more efficient encoding would use
fewer bits for the more common characters. This is what UTF-8 does.
In UTF-8 the ASCII alphabet is encoded using a single byte, just as in ASCII. The next 1,919
characters are encoded in two bytes. The remaining Unicode characters are encoded in three
bytes. However, since these three-byte characters are relatively uncommon,[1] especially in
English text, the savings achieved by encoding ASCII in a single byte more than makes up for
it.
Java's .class files use UTF-8 internally to store string literals. Data input streams and data
output streams also read and write strings in UTF-8. However, this is all hidden from direct
view of the programmer, unless perhaps you're trying to write a Java compiler or parse output
of a data stream without using the DataInputStream class.
Java's .class files use UTF-8 internally to store string literals. Data input streams and data
output streams also read and write strings in UTF-8