紧随时代的问题。。。。

///1............
ＵＮＩＣＯＤＥ是一个标准。ＵＴＦ－８是其概念上的子集，ＵＴＦ－８是具体的编码标准。
而ＵＮＩＣＯＤＥ是所有想达到世界统一编码标准的标准。
///2.............
UTF-8标准就是Unicode（ISO10646）标准的一种变形方式，
UTF的全称是：Unicode/UCS Transformation Format，其实有两种UTF，一种是UTF-8，一种是UTF-16，
不过UTF-16使用较少，其对应关系如下：
在Unicode中编码为 0000 - 007F 的 UTF-8 中编码形式为: 0xxxxxxx
在Unicode中编码为 0080 - 07FF 的 UTF-8 中编码形式为: 110xxxxx 10xxxxxx
在Unicode中编码为 0000 - 007F 的 UTF-8 中编码形式为: 1110xxxx 10xxxxxx 10xxxxxx
///3...............
utf-8是unicode的一个新的编码标准,其实unicode有过好几个标准.
我们知道一直以来使用的unicode字符内码都是16位,它实际上还不能把全世界的所有字符编在一个平面系统,
比如中国的藏文等小语种,所以utf-8扩展到了32位,也就是说理论在utf-8中可容纳二的三十二次方个字符.
UNICODE的思想就是想把所有的字符统一编码,实现一个统一的标准.big5、gb都是独立的字符集,这也叫做
远东字符集,把它拿到德文版的WINDOWS上可能将会引起字符编码的冲突....早期的WINDOWS默认的字符集是
ANSI.notepad中输入的汉字是本地编码,但在NT/2000内部是可以直接支持UNICODE的。
notepad.exe在WIN95和98中都是ANSI字符,
在NT中则是UNICODE.ANSI和UNICODE可以方便的实现对应映射,也就是转换
///4.............
ASCII是8位范围内的字符集，对于范围之外的字符如汉字它是无法表达的。
unicode是16位范围内的字符集，对于不同地区的字符分区分配，unicode是多个IT巨头共同制定的字符编码
标准。如果在unicode环境下比如WINDOWS NT上，一个字符占两字节16位，而在ANSI环境下如WINDOWS98下
一个字符占一个字节8位.Unicode字符是16位宽，最多允许65,535字符，数据类型被称为WCHAR。
对于已有的ANSI字符，unicode简单的将其扩展为16位：比如ANSI"A"=0x43,则对应的UNICODE为
"A"= 0x0043
///5.....................
而ASCII用七存放128个字符,ASCII是一个真正的美国标准,所以它不能满足其他国家的需要,例如
斯拉夫语的字母和汉字
于是出现了Windows ANSI字符集,是一种扩展的ASCII码,用8位存放字符,低128位仍然存放原来的ASCII码,
而高128位加入了希腊字母等
///6..............
if def UNICODE
  TCHAR = wchar
else
  TCHAR = char
你需要在Project\Settings\C/C++\Preprocesser definitions中添加UNICODE和_UNICODE
UINCODE,_UNICODE都要定义。不定义_UNICODE的话，用SetText(HWND,LPCTSTR),将被解释为SetTextA(HWND,LPTSTR),
这时API将把你给的Unicode字符串看作ANSI字符串，显示乱码。
因为windows API是已经编译好存在于dll中的，由于不管UNICODE还是ANSI字符串，都被看作一段buffer,
如"0B A3 00 35 24 3C 00 00"如果按ANSI读，因为ANSI字串是以'\0'结束的，所以只能读到两字节"0B A3 \0"，
如果按UNICODE读，将完整的读到'\0\0'结束。
由于UNICODE没有额外的指示位，所以系统必须知道你提供的字串是哪种格式。
此外，UNICODE好象是ANSI C++规定的，_UNICODE是windows SDK提供的。如果不编写windows程序，可以只定义UNICODE。///如何判断一个UNICODE字符是日文、汉字、还是ASCII吗？
API函数：GetTextCharset
另外IsDBCSLeadByte可以获得是否是双字节编码////如何从文件中读取中文(非unicode)到String中(Java)
//下面程序需在jdk1.1以上使用,如果是jdk1.0,需按所述改动
import java.io.*;public class Test3{public static void main(String args[]){try{ FileInputStream fis = new FileInputStream("Test.dat");int len = fis.available();byte[] b = new byte[len];fis.read(b);String s = new String(b);// 做了从GB 到Unicode 的转换
//如果是jdk1.0,用 String s = new String(b, 0);
System.out.println(s); // 做了从Unicode 到GB 的转换fis.close();}catch(IOException e){}}}
在JAVA1.1中,没有对应UNICODE的输入输出流类,只能用 String(byte[], "8859-1");解决,在1.2 中可以用Reader类解决.
Now I have a way of solute chinese problem in java
(include jdbc-odbc bridge but as if only fit jdk 1.1.* and 1.0.*)
the way is using encoding way.the "iso-8859-1" encoding is the most wide english encoding solution
supposing the chinese data is readed from file or database in String "sChData",
you must deal with the following code
      " byte[] ch_byte = sChData.getBytes("iso-8859-1"); //here we force java vm to read chinese data in a english way
      sChData = new String(ch_byte); //and here we force to turn into unicode data",in addition, when data writed to file or db , and use the following code
      " byte[] ch_byte = sChData.getBytes();
      //here we divide the string into byte array
      sChData = new String(ch_byte,"iso-8859-1");
      //and here we force to turn to the english way from
      //  the unicode byte array" if you want to send data among threads, please use Reader/writer
in jdk 1.1.* or jdk 1.2.* to send the unicode data which are dealt
with the first way;the way fit in all java editions except for jdbc in jdk1.2
(it is only now I have not solutions.)
////////////////////////
My code is here (use chinese params of sql to
read the chinese data from database):(environment :visualage for java 2.0)import java.sql.*;
/**
* This type was created in VisualAge.
*/
class MyTest {
/**
* This method was created in VisualAge.
* @param args java.lang.String[]
*/
public static void main(String args[]) {
    Connection con = null;
    Statement myState = null;
    ResultSet myRSet = null;
    String url, sName;
    try {
        url = "jdbc:odbc:paper";
        Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
        con = DriverManager.getConnection(url, "", "");
        myState = con.createStatement();
        String sql = "select paperName from paper where author = \'张三\'";
        byte[] ch_Byte = sql.getBytes();
        sql = new String(ch_Byte,"iso-8859-1");
        myRSet = myState.executeQuery(sql);
        while (myRSet.next()) {
            sName = myRSet.getString("paperName");
            ch_Byte = sName.getBytes("iso-8859-1");
            sName = new String(ch_Byte);
            System.out.println("Name: " + sName);
        }
    } catch (Exception e) {
        System.out.println("error: " + e.toString());
    }
}
}
#define USES_CONVERSION #define A2W(s) _len = 2*strlen(s);
AfxA2WHelper((LPWSTR)alloca(_len);
AfxA2Whelper是一个调用MultiByteToWideChar的辅助函数//////////////////////////////////////////////////////////////////
char* p = new char[len];
DoSomething(p);
delete [] p;使用下面的代码替代之会效率更高：char *p = (char*)alloca(len);
DoSomething(p);
// 不用调用delete p!

调试易

紧随时代的问题。。。。

解决方案 »