跪求Java UNICODE编码问题

public static void main(String[] args) throws Exception{
String s = "哈";

// do tranfer...

//期望输出：哈
}

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

public static void main(String args[]) {
String s = "哈";
int i = new Integer(s.replaceAll("&#(\\d+);", "$1"));
System.out.println((char) i);
}
这问题我在 CSDN 至少回复过 5 次！http://topic.csdn.net/u/20080627/09/2b254473-8bc7-4da6-a60b-cf4295c126bf.html
10 楼回复
  这个确实可以，但是没明白这个编码&#;的用途，还有对于这个编码难道没有直接转换的方法吗？只能暴力截取和转换？
其实不怎么正确，因为只限一个而且不能有其他字符……
再来一个，然后去看火龙果链接……public static void main(String args[]) {
String s = "哈哈哈";
String regex = "&#(\\d+);";
String s2 = s.replaceAll(regex, "%c");
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
List<Integer> list = new ArrayList<Integer>();
while (m.find()) {
list.add((new Integer(m.group(1))));
}
Object[] values = list.toArray(new Integer[0]);
String s3 = String.format(s2, values);
System.out.println(s3);
}
我明白解法，但是没明白，这个为什么牵扯到HTML?这个编码到底算什么编码？为什么需要这样解析转换。我刚看了Apache commons lang包，也是类似的做法。谁能说说这个编码吗？我是在做JMS的项目遇到哦，
我使用Apache MQ 在console 手动创建一条Message,然后就出现这个编码了。很不明白为什么这么干？
[Quote Wiki上的一段话:]
In order to work around the limitations of legacy encodings, HTML is designed such that it is possible to represent characters from the whole of Unicode inside an HTML document by using a numeric character reference: a sequence of characters that explicitly spell out the Unicode code point of the character being represented. A character reference takes the form &#N;, where N is either a decimal number for the Unicode code point, or a hexadecimal number, in which case it must be prefixed by x. The characters that compose the numeric character reference are universally representable in every encoding approved for use on the Internet.For example, a Unicode code point like U+53F6, which corresponds to a particular Chinese character, has to be converted to a decimal number, preceded by &# and followed by ;, like this: 叶, which produces this: 叶 (if it doesn't look like a Chinese character, see the special characters note at bottom of article).The support for hexadecimal in this context is more recent, so older browsers might have problems displaying characters referenced with hexadecimal numbers—but they will probably have a problem displaying Unicode characters above code point 255 anyway. To ensure better compatibility with older browsers, it is still a common practice to convert the hexadecimal code point into a decimal value (for example 叶 instead of 叶).
[/Quote]再改个……增加了对16进制的处理和上限ffff限制public static void main(String args[]) {
String s = "海-谢谢哈";
String regex = "&#(x([0-9a-fA-F]{1,4})|(6(5(5((3[0-5])|([0-2]\\d))|([0-4]\\d{1,2}))|([0-4]\\d{1,3})))|([0-5]\\d{1,4}));";
String s2 = s.replaceAll(regex, "%c");
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
List<Integer> list = new ArrayList<Integer>();
while (m.find()) {
if (m.group(2) != null)
list.add(Integer.valueOf(m.group(2), 16));
else
list.add(Integer.valueOf(m.group(1)));
}
String s3 = String.format(s2, list.toArray());
System.out.println(s3);
}外来包什么的，最讨厌了= =
String regex = "&#(x([0-9a-fA-F]{1,4})|(6(5(5((3[0-5])|([0-2]\\d))|([0-4]\\d{0,2}))|([0-4]\\d{0,3})))|([0-5]\\d{0,4})|([6-9])|([6-9]\\d{0,3}));";正则错了，再改下……
Java code
public static void main(String args[]) {
String s = "&#21704;";
int i = new Integer(s.replaceAll("&#(\\d+);", "$1"));
System.out.println((char) i);
}