字符串超长了，只想取4000个byte，字符串中可能有各种字符，怎么能够保证截取之后，最后一个字是完整的呢？

比如说最后一个字是4个byte的，截到4000时，这个字截取了2个byte，那么这个字就不完整了，该舍弃掉；比如说最后一个字是2个byte的，截到4000时，这个字就完整了，不该舍弃掉；该如何区分这种情况呢？

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

见过这样的问题，具体记不太清了，应该根据最后数组是以什么结尾判断。只是一种思路，具体的查查文档。好久没有用java了。有点生疏了。不能给出完整的解决方案了郁闷
得看你的字符串编码是什么.
听你说像是 UTF8, 这样的话你就自己写一个截取函数, 对字符串的字节数组进行迭代, 判断当前字节的 ASCII 值, 根据上下文应该可以判断该直接是独立的还是某字符的部分.
我对 UTF8 的规则没详细看, 记得是 1/2/3 位长度的字符每个字节的值是有固定规定的.
http://www.wu-ftpd.org/rfc/rfc2640.html以上是 UTF-8 规范, 看 B.1 Valid UTF-8 check 部分. 演示了如何判断合法 UTF-8 字符串:)  The following routine checks if a byte sequence is valid UTF-8. This
   is done by checking for the proper tagging of the first and following
   bytes to make sure they conform to the UTF-8 format. It then checks
   to assure that the data part of the UTF-8 sequence conforms to the
   proper range allowed by the encoding. Note: This routine will not
   detect characters that have not been assigned and therefore do not
   exist.int utf8_valid(const unsigned char *buf, unsigned int len)
{
const unsigned char *endbuf = buf + len;
unsigned char byte2mask=0x00, c;
int trailing = 0;  // trailing (continuation) bytes to follow while (buf != endbuf)
{
   c = *buf++;
   if (trailing)
    if ((c&0xC0) == 0x80)  // Does trailing byte follow UTF-8 format?
    {if (byte2mask)        // Need to check 2nd byte for proper range?
      if (c&byte2mask)     // Are appropriate bits set?
       byte2mask=0x00;
      else
       return 0;
     trailing--; }
    else
     return 0;
   else
    if ((c&0x80) == 0x00)  continue;      // valid 1 byte UTF-8
    else if ((c&0xE0) == 0xC0)            // valid 2 byte UTF-8
          if (c&0x1E)                     // Is UTF-8 byte in
                                          // proper range?
           trailing =1;
          else
           return 0;
    else if ((c&0xF0) == 0xE0)           // valid 3 byte UTF-8
          {if (!(c&0x0F))                // Is UTF-8 byte in
                                         // proper range?
            byte2mask=0x20;              // If not set mask
                                         // to check next byte
            trailing = 2;}
    else if ((c&0xF8) == 0xF0)           // valid 4 byte UTF-8
          {if (!(c&0x07))                // Is UTF-8 byte in
                                         // proper range?            byte2mask=0x30;              // If not set mask
                                         // to check next byte
            trailing = 3;}
    else if ((c&0xFC) == 0xF8)           // valid 5 byte UTF-8
          {if (!(c&0x03))                // Is UTF-8 byte in
                                         // proper range?
            byte2mask=0x38;              // If not set mask
                                         // to check next byte
            trailing = 4;}
    else if ((c&0xFE) == 0xFC)           // valid 6 byte UTF-8
          {if (!(c&0x01))                // Is UTF-8 byte in
                                         // proper range?
            byte2mask=0x3C;              // If not set mask
                                         // to check next byte
            trailing = 5;}
    else  return 0;
}
  return trailing == 0;
}
我们领导是这么解决的：
private String subAttrbuteValue(String attrValue) {
String attrStr = attrValue;
int maxLenth = 4000;
if(attrStr != null && attrStr.length() > 0){
   byte[] attrBytes = attrValue.getBytes();
if(attrBytes.length > maxLenth){                     byte[] subAttrBytes = new byte[maxLenth];
for(int i = 0; i < maxLenth; i++){
subAttrBytes[i] = attrBytes[i];
}
String subStr=new String(subAttrBytes);
int subStrLen = subStr.length();
if(attrStr.substring(0, subStrLen).getBytes().length > maxLenth){
attrStr = attrValue.substring(0, subStrLen -1);
}else{
attrStr = attrValue.substring(0, subStrLen);
}
}
}
return attrStr;
}