Unicode、Unicode big endian和UTF-8编码的txt文件的开头会多出几个字节,分别是FF、FE(Unicode),FE、FF(Unicode big endian),EF、BB、BF(UTF-8)。
可以使用函数 IsTextUnicodeThe IsTextUnicode function determines whether a buffer is likely to contain a form of Unicode text. The function uses various statistical and deterministic methods to make its determination, under the control of flags passed via lpi. When the function returns, the results of such tests are reported via lpi. BOOL IsTextUnicode( CONST VOID* pBuffer, // input buffer to be examined int cb, // size of input buffer LPINT lpi // options );
int main(int argc, char* argv[]) { //打开要判断的文件 FILE *f = fopen("c:\\UTF-8.txt","r+b"); //这里要注意是用unsigned char,不然的话读取到的数据会因为溢出而无法正确判断 unsigned char* chFileFlag = new unsigned char[3];
FF FE开头的是UNICODE
EF BB BF开头的是UTF8
其他的是内码
IsTextUnicodeThe IsTextUnicode function determines whether a buffer is likely to contain a form of Unicode text. The function uses various statistical and deterministic methods to make its determination, under the control of flags passed via lpi. When the function returns, the results of such tests are reported via lpi. BOOL IsTextUnicode(
CONST VOID* pBuffer, // input buffer to be examined
int cb, // size of input buffer
LPINT lpi // options
);
{
//打开要判断的文件
FILE *f = fopen("c:\\UTF-8.txt","r+b"); //这里要注意是用unsigned char,不然的话读取到的数据会因为溢出而无法正确判断
unsigned char* chFileFlag = new unsigned char[3];
fread(chFileFlag, 1, 3, f); if (chFileFlag[0] == 0xEF && chFileFlag[1] == 0xBB && chFileFlag[2] == 0xBF)
cout << "该文本是一个UTF-8格式的文件";
else if (chFileFlag[0] == 0xFF && chFileFlag[1] == 0xFE)
cout << "该文本是一个Unicode格式的文件";
else if (chFileFlag[0] == 0xFE && chFileFlag[1] == 0xFF)
cout << "该文本是一个Big Unicode格式的文件";
else
cout << "该文本是一个Ansi格式的文件";
fclose(f);
delete chFileFlag;
return 0;
}