用"select length(字段) from 表"这个得到的还是字符的长度,而不是字节的长度。我插入676个字符(包括汉字和换行符等),返回的还是676。我又看了oracle的官方文档,发现上面说: # If the driver is JDBC OCI and the client character set is not US7ASCII or WE8ISO8859P1, then a call to getBinaryStream() returns UTF-8. If the client character set is US7ASCII or WE8ISO8859P1, then the call returns a US7ASCII stream of bytes.# If the driver is JDBC Thin and the database character set is not US7ASCII or WE8ISO8859P1, then a call to getBinaryStream() returns UTF-8. If the server-side character set is US7ASCII or WE8ISO8859P1, then the call returns a US7ASCII stream of bytes.我在VARCHAR2上测试了一下,情况一样。也就是说如果客户端的字符编码不是US7ASCII 和WE8ISO8859P1,则都采用utf8的编码。那我在判断何时使用setString,何时采用setCharacterStream时,是不是可以这样 if(content.getBytes("utf8").length > 2000){ then.....} ? 如果这样判断,在其他的数据库上会不会有问题?
你现在使用的是JDBC,那就是因为客户端的字符集和服务器端的字符集不一样. 通用字符集名 编码字符集名 US7ASCII US 7-bit ASCII character set WE8DEC DEC West European 8-bit character set WE8HP HP West European Laserjet 8-bit character set F7DEC DEC French 7-bit character set WE8EBCDIC500 IBM West European EBCDIC Code Page 500 WE8PC850 IBM PC Code Page 850 WE8ISO8859P1 ISO 8859-1 West European 8-bit character set
我做了一下测试,请看清下面的测试,对于文档中提到的US7ASCII or WE8ISO8859P1 两个字符集,分别测试单双字节的情况SELECT length(CONVERT('我', 'US7ASCII','UTF8')) from dual --返回1 SELECT length(CONVERT('我', 'UTF8','US7ASCII')) from dual --返回2 SELECT lengthb(CONVERT('我', 'US7ASCII','UTF8')) from dual --返回1 SELECT lengthb(CONVERT('我', 'UTF8','US7ASCII')) from dual --返回2 SELECT length(CONVERT('A', 'US7ASCII','UTF8')) from dual --返回1 SELECT lengthb(CONVERT('A', 'US7ASCII','UTF8')) from dual --返回1 SELECT length(CONVERT('A', 'UTF8','US7ASCII')) from dual --返回1 SELECT lengthb(CONVERT('N', 'UTF8','US7ASCII')) from dual --返回1SELECT length(CONVERT('我', 'WE8ISO8859P1','UTF8')) from dual --返回0 SELECT length(CONVERT('我', 'UTF8','WE8ISO8859P1')) from dual --返回2 SELECT lengthb(CONVERT('我', 'WE8ISO8859P1','UTF8')) from dual --返回1 SELECT lengthb(CONVERT('我', 'UTF8','WE8ISO8859P1')) from dual --返回4 SELECT length(CONVERT('A', 'WE8ISO8859P1','UTF8')) from dual --返回1 SELECT lengthb(CONVERT('A', 'WE8ISO8859P1','UTF8')) from dual --返回1 SELECT length(CONVERT('A', 'UTF8','WE8ISO8859P1')) from dual --返回1 SELECT lengthb(CONVERT('N', 'UTF8','WE8ISO8859P1')) from dual --返回1
SELECT length(CONVERT('我AB', 'US7ASCII','UTF8')) from dual --返回3 SELECT lengthb(CONVERT('我AB', 'US7ASCII','UTF8')) from dual --返回3 SELECT length(CONVERT('我AB', 'UTF8','US7ASCII')) from dual --返回4 SELECT lengthb(CONVERT('我AB', 'UTF8','US7ASCII')) from dual --返回4
# If the driver is JDBC OCI and the client character set is not US7ASCII or WE8ISO8859P1, then a call to getBinaryStream() returns UTF-8. If the client character set is US7ASCII or WE8ISO8859P1, then the call returns a US7ASCII stream of bytes.# If the driver is JDBC Thin and the database character set is not US7ASCII or WE8ISO8859P1, then a call to getBinaryStream() returns UTF-8. If the server-side character set is US7ASCII or WE8ISO8859P1, then the call returns a US7ASCII stream of bytes.我在VARCHAR2上测试了一下,情况一样。也就是说如果客户端的字符编码不是US7ASCII 和WE8ISO8859P1,则都采用utf8的编码。那我在判断何时使用setString,何时采用setCharacterStream时,是不是可以这样
if(content.getBytes("utf8").length > 2000){ then.....} ?
如果这样判断,在其他的数据库上会不会有问题?
通用字符集名 编码字符集名
US7ASCII US 7-bit ASCII character set
WE8DEC DEC West European 8-bit character set
WE8HP HP West European Laserjet 8-bit character set
F7DEC DEC French 7-bit character set
WE8EBCDIC500 IBM West European EBCDIC Code Page 500
WE8PC850 IBM PC Code Page 850
WE8ISO8859P1 ISO 8859-1 West European 8-bit character set
SELECT length(CONVERT('我', 'UTF8','US7ASCII')) from dual --返回2
SELECT lengthb(CONVERT('我', 'US7ASCII','UTF8')) from dual --返回1
SELECT lengthb(CONVERT('我', 'UTF8','US7ASCII')) from dual --返回2
SELECT length(CONVERT('A', 'US7ASCII','UTF8')) from dual --返回1
SELECT lengthb(CONVERT('A', 'US7ASCII','UTF8')) from dual --返回1
SELECT length(CONVERT('A', 'UTF8','US7ASCII')) from dual --返回1
SELECT lengthb(CONVERT('N', 'UTF8','US7ASCII')) from dual --返回1SELECT length(CONVERT('我', 'WE8ISO8859P1','UTF8')) from dual --返回0
SELECT length(CONVERT('我', 'UTF8','WE8ISO8859P1')) from dual --返回2
SELECT lengthb(CONVERT('我', 'WE8ISO8859P1','UTF8')) from dual --返回1
SELECT lengthb(CONVERT('我', 'UTF8','WE8ISO8859P1')) from dual --返回4
SELECT length(CONVERT('A', 'WE8ISO8859P1','UTF8')) from dual --返回1
SELECT lengthb(CONVERT('A', 'WE8ISO8859P1','UTF8')) from dual --返回1
SELECT length(CONVERT('A', 'UTF8','WE8ISO8859P1')) from dual --返回1
SELECT lengthb(CONVERT('N', 'UTF8','WE8ISO8859P1')) from dual --返回1
SELECT lengthb(CONVERT('我AB', 'US7ASCII','UTF8')) from dual --返回3
SELECT length(CONVERT('我AB', 'UTF8','US7ASCII')) from dual --返回4
SELECT lengthb(CONVERT('我AB', 'UTF8','US7ASCII')) from dual --返回4