我写的itoa函数，比VC实现的itoa效率高出16%，100万个数的转换只要8.5秒，而VC的itoa要10.2秒，欢迎挑战

我的机器是P4 1.4GHz，经过全面测试，value 的值从-1000000到1000000，radix从2到36，所得结果与itoa的完全一致。当radix小于2或大于36时，itoa会输出乱码，但我的函数会将字符串首位置0。欢迎朋友挑战，希望您能写出比这个效率更高的代码。为了公平性，只能用C/C++代码，不允许使用内嵌汇编。实际上我已经尝试过内嵌汇编，但自己写的汇编还不如VC优化后的代码效率高。
char *myitoa( int value, char *str, int radix )
{
static char szMap[] = {
'0', '1', '2', '3', '4', '5',
'6', '7', '8', '9', 'a', 'b',
'c', 'd', 'e', 'f', 'g', 'h',
'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't',
'u', 'v', 'w', 'x', 'y', 'z'
}; // 字符映射表
int nCount = -1, nIndex;
char *pStr = str, nTemp;
if ( radix >= 2 && radix <= 36 )
{ // 限制radix必须在2到36之间
if ( value < 0 && radix == 10 )
{ // 如果是负数就在首位添加负号，并将字符串前移
*pStr++ = '-';
value = -value; //转为正数，
}
unsigned int nValue = *(unsigned*)&value;
do { // 循环转换每一个数字，直到结束
pStr[ ++nCount ] = szMap[ nValue % radix ];
nValue /= radix;
} while( nValue > 0 ); // 转换结束后字符串是翻的
nIndex = ( nCount + 1 ) / 2; // 计算出一半的长度
while( nIndex-- > 0 ) { // 将字符串的字符序翻转
nTemp = pStr[ nIndex ];
pStr[ nIndex ] = pStr[ nCount - nIndex ];
pStr[ nCount - nIndex ] = nTemp;
}
}
pStr[ nCount + 1 ] = '\0'; // 置结束符
return str;
}

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

当然不支持了，不过把里面所有的char替换成wchar_t就可以了！
w_char 只是一个字符是双字节，我的意思有没有方法可以转汉字之类的双/多字节字符
汉字有两种表示方法，一是Unicode，那就是w_char，二是multibyte，就是char不知道你是什么意思？转为“三千五百二十一”这类的字符串？？
不管是Unicode还是Multibyte，ASCII码表里前128个字符的代码都是一样的，48 '0' 在wchar_t和char里完全一样
Sorry，没说清楚，呵呵。这个字符转换是基于单字符的，比如现在有一个数比如 0xdadb （56027），假设它代表 “国”这个中文字，那么我希望转换出来的是 “国”，而不是 "56027"。
当然这个是基于某种字符集的转换了，和楼主的挑战没有什么关系，只是想到就提出来了，还请见谅。
“国”字Unicode是0x56fd，MultiByte是0xb9faint x = 0x56fd; 或 int x = 0xb9fa;想转为wchar_t：
wchar_t c[2] = { (wchar_t)x, 0 };想转为char：
char c[3] = { ((char*)&x)[1], ((char*)&x)[0], 0 };
to
fireseed(奶油狗【奥伊斯特拉赫的声音简直不可思异！】) 不管是Unicode还是Multibyte，ASCII码表里前128个字符的代码都是一样的，48 '0' 在wchar_t和char里完全一样错!!!在Multibyte, 数字字符0是 '0' , 是 0x30
在Unicode, 数字字符0是  L'0' ,是0x0030
0x30和0x0030不一样？哈哈，地球人都知道wchar_t是两个字节，char是一个字节，所以您也甭在这钻牛角尖了
你说下面的代码输出什么？ wchar_t x = L'0';
char y = '0'; if ( (int)x != (int)y ) cout << "creamdog是白痴";
else cout << "iicup是白痴";
不错！还可再快一点，先用下面代码计算一下输出字符有多少位：
for (int i=1;i<=32;i++)
{
if(abs(value)>=pow(radix,i))
continue;
else
break;
}
i为输出的位数，知道这个后可以直接在指定位置输出结果，省去后面字符串反转的代码的时间开销。
对于radix很小，如二进制的话，省下的时间应该会更多一些。
for (int i=1;i<=32;i++)
{
if(abs(value)>=pow(radix,i))
continue;
else
break;
}这个时间可要比反转字串的时间多多了
pow本身就很费时，而反转字串其实只是一半或不到一半。
pow本身就很费时，而反转字串其实只是一半或不到一半。
---------------------------------------------------------------------
这个没错，教科书上也这么说：乘法运算一次要用去CPU几百个时钟周期，而加法运算只要几个周期。但前一阵子，有位朋友专门发了个贴讨论优化程序效率时，是否有必要尽量把乘法改成加法的问题。URL不记得了，汗......贴中说：根据实际测算，几百万次加法(int型)也只比同样次数的乘法(float型)快约10%(大概是这么多吧)。如果他说的是真的，那计算位数的开销还是值得的。所以，才有了我上面的想法(其实我也还没试^_^)，尽信书不如无书，实践才是检验真理的唯一标准。
上面求位数的循环用乘法的次数多了点(有几位就用几次pow)，那么用下面公式算吧：
位数i=ceil(log10(abs(value))/log10(radix));
不管结果有几位，计算用的乘法运算量固定这么几次，在大多数场合下应该比循环有效率些。
ceil(log10(abs(value))/log10(radix));调了四次函数，光压、弹栈的时间就没法说了
Sangel() ( ) 信誉：100  2006-6-4 19:46:20  得分: 0


不错··不过觉得没什么意义。要想更快那不如用汇编··

====================================
其实VC优化后的代码比你写的汇编要快多了！
xijiang2006(希疆) ( ) 信誉：100  2006-06-04 21:51:00  得分: 0

   不懂为什么库函数能这样轻易的被改进！库函数不是经过了大量测试吗？有这方面知识的介绍下！
========================================
有几个原因：
1. VC的函数库的原代码在编写时要考虑到编码规范，要知道MS的编码规范是相当严格，因此有些技术不能使用。
2. 为了代码的可移植性，MS的程序不得不放弃了一些C/C++语言特性。
3. 这类的C函数，很多在内部调用上实际都调的是一个底层函数。这样会让原代码的可阅读性非常好，但调函数则会浪费大量的时间。
too late to study now. I'll look at tomorrow :-)
1. VC的函数库的原代码在编写时要考虑到编码规范，要知道MS的编码规范是相当严格，因此有些技术不能使用。
2. 为了代码的可移植性，MS的程序不得不放弃了一些C/C++语言特性。itoa是Runtime函数，是标准库函数，和VC，MS没有关系。在Linux平台下一样可以使用。
这两条没有根据。
Well, this morning I looked your code and the MS itoa version. Indeed you are both good, following is MS itoa:/* helper routine that does the main job. */
static void __cdecl xtoa (
        unsigned long val,
        char *buf,
        unsigned radix,
        int is_neg
        )
{
        char *p;                /* pointer to traverse string */
        char *firstdig;         /* pointer to first digit */
        char temp;              /* temp char */
        unsigned digval;        /* value of digit */        p = buf;        if (is_neg) {
            /* negative, so output '-' and negate */
            *p++ = '-';
            val = (unsigned long)(-(long)val);
        }        firstdig = p;           /* save pointer to first digit */        do {
            digval = (unsigned) (val % radix);
            val /= radix;       /* get next digit */            /* convert to ascii and store */
            if (digval > 9)
                *p++ = (char) (digval - 10 + 'a');  /* a letter */
            else
                *p++ = (char) (digval + '0');       /* a digit */
        } while (val > 0);        /* We now have the digit of the number in the buffer, but in reverse
           order.  Thus we reverse them now. */        *p-- = '\0';            /* terminate string; p points to last digit */        do {
            temp = *p;
            *p = *firstdig;
            *firstdig = temp;   /* swap *p and *firstdig */
            --p;
            ++firstdig;         /* advance to next two digits */
        } while (firstdig < p); /* repeat until halfway */
}
my test function:int main(int argc, char* argv[])
{
     char buf[32];
     int i;
clock_t start, end;     start = clock();
     for(i=0; i<10000000; i++)
     {
         myitoa(i, buf, 10);
     }
     end = clock();
     printf("my time: %f\n", (end-start) * 1.0/CLOCKS_PER_SEC);
     start = clock();
     for(i=0; i<10000000; i++)
     {
xtoa((unsigned long)(unsigned int)i, buf, 10, 0);
     }
     end = clock();
     printf("ms time: %f\n", (end-start) * 1.0/CLOCKS_PER_SEC);
      getch();
     return 0;
}
------------
my time: 1.682000
ms time: 1.672000
my time: 1.692000
ms time: 1.662000
my time: 1.692000
ms time: 1.662000
well, in default console release configuration in VS 2003, your version is slight slow :-)
下面是我的测试代码(硬件：C4 2.0G的CPU，248M内存；软件：XP系统sp1版本，VC++6.0)：#include <iostream.h>
void main()
{
char str[256]={0};
int i;
DWORD start;
//先测试自己的函数
start=::GetTickCount();
for(i=0;i<10485760;i++) //求10M个数
myitoa(i,str,2); //测试自己的函数
cout<<"测试自己的函数："<<::GetTickCount()-start<<"ms"<<endl; //后测试系统函数
start=::GetTickCount();
for(i=0;i<10485760;i++) //同样求10M个数
itoa(i,str,2); //测试系统函数
cout<<"测试系统函数　："<<::GetTickCount()-start<<"ms"<<endl;
}
运行结果(运行多次后，以比较稳定的结果为准)：测试自己的函数：15391ms
测试系统函数　：15109ms==================================================
OK!考虑到后运行的函数有一定的优势(可能可以从Catch中访问)，把测试先后位置对调一下，代码变成：
#include <iostream.h>
void main()
{
char str[256]={0};
int i;
DWORD start; //先测试系统函数
start=::GetTickCount();
for(i=0;i<10485760;i++) //同样求10M个数
itoa(i,str,2); //测试系统函数
cout<<"测试系统函数　："<<::GetTickCount()-start<<"ms"<<endl;

//后测试自己的函数
start=::GetTickCount();
for(i=0;i<10485760;i++) //求10M个数
myitoa(i,str,2); //测试自己的函数
cout<<"测试自己的函数："<<::GetTickCount()-start<<"ms"<<endl;
}运行结果(运行多次后，以比较稳定的结果为准)：测试系统函数　：15297ms
测试自己的函数：15172ms
======================================================================
对比结果显示：在同一次比较中，后测试的函数总是快人一等；再把两段代码横向比较，系统函数略快一些。这个结论应该可以接受吧。
to liuguangzhou(光子) :你没有看过VC实现的C函数库源代码，就不要乱讲。东西可以乱吃，话不可以乱讲的哦to Snow_Ice11111：测试一段函数时，要在代码前加上：
Sleep( 2000 );
以避免系统调入代码时引起的磁盘缓冲，而造成起步延迟
to sclzmbie(忘我)：
你只测了10进制，如果要是这样测的话，我可以专门写一个10进制的软换函数，比系统的更快。要全面测试，OK？还有，要测多次取平均值
还有，要打开优化，系统函数是无论你debug编译还是release编译，都是经过一定优化的，而我的函数则没有，所以不完全优化编译，就没有比较的公平性可言。
if ( (int)x != (int)y ) cout << "creamdog是白痴";
else cout << "iicup是白痴";to liuguangzhou(光子) :你没有看过VC实现的C函数库源代码，就不要乱讲。东西可以乱吃，话不可以乱讲的哦楼主对待不同意见就是这个态度？
幸亏只是在网上讨论讨论，要不然恐怕要睚眦必报，拳脚相加了。呵呵
#include <stdio.h>
#include <stdlib.h>void main(void)
{
char buf[255];
    int i=100;

for(int j=2;j<=i;j++)
printf("\n%d进制100值为:%s",j,itoa(i,buf,j));
}要这样测．考虑进制的问题．
fireseed(奶油狗【奥伊斯特拉赫的声音简直不可思异！】) ( ) 信誉：96  2006-06-04 21:57:00  得分: 0

   xijiang2006(希疆) ( ) 信誉：100  2006-06-04 21:51:00  得分: 0

   不懂为什么库函数能这样轻易的被改进！库函数不是经过了大量测试吗？有这方面知识的介绍下！
========================================
有几个原因：
1. VC的函数库的原代码在编写时要考虑到编码规范，要知道MS的编码规范是相当严格，因此有些技术不能使用。
2. 为了代码的可移植性，MS的程序不得不放弃了一些C/C++语言特性。
3. 这类的C函数，很多在内部调用上实际都调的是一个底层函数。这样会让原代码的可阅读性非常好，但调函数则会浪费大量的时间。

=================================================================================1. 编码规范严格和编译后代码的效率没有任何关系，不能使用某些技术是因为这些事应该由编译器来做。
2. 需要考虑可移植性的函数都是IO或线程相关的函数，itoa这种代码根本没有什么不可移植的。
3. "这类的C函数"是指itoa这类么？笑，如果你认为 + 和 * 操作算是底层函数的话那是没问题的了，itoa明显的计算密集型函数么。当然楼主这种钻研的态度还是值得称赞的，但是人品永远是比技术重要的，楼主不会因为写出个itoa就自认比ms nb了吧，代码里写出"xxx是白痴"这种话实在有点过分啊。程序空间换时间获得效率提升是再正常不过的事情了，楼上也有人贴了ms的实现，一共四个变量，vc下空间不会超过14byte，在看楼主的代码，上来就一个36char的static数组，加上后边定义的两个int，两个指针，至少52byte，用300%的空间换来多16%的效率，还是不要在这里炫耀了吧~
singlie
1. 编码规范严格和编译后代码的效率没有任何关系，不能使用某些技术是因为这些事应该由编译器来做。
2. 需要考虑可移植性的函数都是IO或线程相关的函数，itoa这种代码根本没有什么不可移植的。
3. "这类的C函数"是指itoa这类么？笑，如果你认为 + 和 * 操作算是底层函数的话那是没问题的了，itoa明显的计算密集型函数么。1. 我的意思你没懂？我就是指空间换时间！
2. 去看一下strcpy的VC实现源代码，就知道是否可以移植了。
3. 又来了一个不看源代码的，请先去看一下VC的itoa实现，你自然就知道它底层调的是什么函数。自己笑自己吧，呵呵。
怕你找不着源代码在哪，还是我贴出来算了。自己看：
（itoa和_itoa都define到itox）/* Actual functions just call conversion helper with neg flag set correctly,
   and return pointer to buffer. */TCHAR * __cdecl _itox (
        int val,
        TCHAR *buf,
        int radix
        )
{
        if (radix == 10 && val < 0)
            xtox((unsigned long)val, buf, radix, 1);// <--这里不调底层的函数？
        else
            xtox((unsigned long)(unsigned int)val, buf, radix, 0);
        return buf;
}==========================================
以下是strcpy、strcat等的源代码，你认为可移植？        page    ,132
        title   strcat - concatenate (append) one string to another
;***
;strcat.asm - contains strcat() and strcpy() routines
;
;       Copyright (c) Microsoft Corporation. All rights reserved.
;
;Purpose:
;       STRCAT concatenates (appends) a copy of the source string to the
;       end of the destination string, returning the destination string.
;
;*******************************************************************************        .xlist
        include cruntime.inc
        .list
page
;***
;char *strcat(dst, src) - concatenate (append) one string to another
;
;Purpose:
;       Concatenates src onto the end of dest.  Assumes enough
;       space in dest.
;
;       Algorithm:
;       char * strcat (char * dst, char * src)
;       {
;           char * cp = dst;
;
;           while( *cp )
;                   ++cp;           /* Find end of dst */
;           while( *cp++ = *src++ )
;                   ;               /* Copy src to end of dst */
;           return( dst );
;       }
;
;Entry:
;       char *dst - string to which "src" is to be appended
;       const char *src - string to be appended to the end of "dst"
;
;Exit:
;       The address of "dst" in EAX
;
;Uses:
;       EAX, ECX
;
;Exceptions:
;
;*******************************************************************************page
;***
;char *strcpy(dst, src) - copy one string over another
;
;Purpose:
;       Copies the string src into the spot specified by
;       dest; assumes enough room.
;
;       Algorithm:
;       char * strcpy (char * dst, char * src)
;       {
;           char * cp = dst;
;
;           while( *cp++ = *src++ )
;                   ;               /* Copy src over dst */
;           return( dst );
;       }
;
;Entry:
;       char * dst - string over which "src" is to be copied
;       const char * src - string to be copied over "dst"
;
;Exit:
;       The address of "dst" in EAX
;
;Uses:
;       EAX, ECX
;
;Exceptions:
;*******************************************************************************
        CODESEG%       public  strcat, strcpy      ; make both functions available
strcpy  proc \
        dst:ptr byte, \
        src:ptr byte        OPTION PROLOGUE:NONE, EPILOGUE:NONE        push    edi                 ; preserve edi
        mov     edi,[esp+8]         ; edi points to dest string
        jmp     short copy_startstrcpy  endp        align   16strcat  proc \
        dst:ptr byte, \
        src:ptr byte        OPTION PROLOGUE:NONE, EPILOGUE:NONE        .FPO    ( 0, 2, 0, 0, 0, 0 )        mov     ecx,[esp+4]         ; ecx -> dest string
        push    edi                 ; preserve edi
        test    ecx,3               ; test if string is aligned on 32 bits
        je      short find_end_of_dest_string_loopdest_misaligned:                    ; simple byte loop until string is aligned
        mov     al,byte ptr [ecx]
        add     ecx,1
        test    al,al
        je      short start_byte_3
        test    ecx,3
        jne     short dest_misaligned        align   4find_end_of_dest_string_loop:
        mov     eax,dword ptr [ecx] ; read 4 bytes
        mov     edx,7efefeffh
        add     edx,eax
        xor     eax,-1
        xor     eax,edx
        add     ecx,4
        test    eax,81010100h
        je      short find_end_of_dest_string_loop
        ; found zero byte in the loop
        mov     eax,[ecx - 4]
        test    al,al               ; is it byte 0
        je      short start_byte_0
        test    ah,ah               ; is it byte 1
        je      short start_byte_1
        test    eax,00ff0000h       ; is it byte 2
        je      short start_byte_2
        test    eax,0ff000000h      ; is it byte 3
        je      short start_byte_3
        jmp     short find_end_of_dest_string_loop
                                    ; taken if bits 24-30 are clear and bit
                                    ; 31 is set
start_byte_3:
        lea     edi,[ecx - 1]
        jmp     short copy_start
start_byte_2:
        lea     edi,[ecx - 2]
        jmp     short copy_start
start_byte_1:
        lea     edi,[ecx - 3]
        jmp     short copy_start
start_byte_0:
        lea     edi,[ecx - 4]
;       jmp     short copy_start;       edi points to the end of dest string.
copy_start::
        mov     ecx,[esp+0ch]       ; ecx -> sorc string
        test    ecx,3               ; test if string is aligned on 32 bits
        je      short main_loop_entrancesrc_misaligned:                     ; simple byte loop until string is aligned
        mov     dl,byte ptr [ecx]
        add     ecx,1
        test    dl,dl
        je      short byte_0
        mov     [edi],dl
        add     edi,1
        test    ecx,3
        jne     short src_misaligned
        jmp     short main_loop_entrancemain_loop:                          ; edx contains first dword of sorc string
        mov     [edi],edx           ; store one more dword
        add     edi,4               ; kick dest pointer
main_loop_entrance:
        mov     edx,7efefeffh
        mov     eax,dword ptr [ecx] ; read 4 bytes        add     edx,eax
        xor     eax,-1        xor     eax,edx
        mov     edx,[ecx]           ; it's in cache now        add     ecx,4               ; kick dest pointer
        test    eax,81010100h        je      short main_loop
        ; found zero byte in the loop
; main_loop_end:
        test    dl,dl               ; is it byte 0
        je      short byte_0
        test    dh,dh               ; is it byte 1
        je      short byte_1
        test    edx,00ff0000h       ; is it byte 2
        je      short byte_2
        test    edx,0ff000000h      ; is it byte 3
        je      short byte_3
        jmp     short main_loop     ; taken if bits 24-30 are clear and bit
                                    ; 31 is set
byte_3:
        mov     [edi],edx
        mov     eax,[esp+8]         ; return in eax pointer to dest string
        pop     edi
        ret
byte_2:
        mov     [edi],dx
        mov     eax,[esp+8]         ; return in eax pointer to dest string
        mov     byte ptr [edi+2],0
        pop     edi
        ret
byte_1:
        mov     [edi],dx
        mov     eax,[esp+8]         ; return in eax pointer to dest string
        pop     edi
        ret
byte_0:
        mov     [edi],dl
        mov     eax,[esp+8]         ; return in eax pointer to dest string
        pop     edi
        retstrcat  endp        end
well, can you write a "pesdo 10-based" itoa that fast 50% or more than ms xtoa?
sclzmbie(忘我) ( ) 信誉：96  2006-06-05 16:09:00  得分: 0

   well, can you write a "pesdo 10-based" itoa that fast 50% or more than ms xtoa?

================================为什么要快50%？我只是说快的多！30%一定有的char *myitoa10( int value, char *str )
{
int nCount = -1, nIndex;
char *pStr = str, nTemp;
if ( value < 0 )
{ // 如果是负数就在首位添加负号，并将字符串前移
*pStr++ = '-';
value = -value; //转为正数，
}
do { // 循环转换每一个数字，直到结束
pStr[ ++nCount ] = value % 10 + '0';
value /= 10;
} while( value > 0 ); // 转换结束后字符串是翻的
nIndex = ( nCount + 1 ) / 2; // 计算出一半的长度
while( nIndex-- > 0 ) { // 将字符串的字符序翻转
nTemp = pStr[ nIndex ];
pStr[ nIndex ] = pStr[ nCount - nIndex ];
pStr[ nCount - nIndex ] = nTemp;
}
pStr[ nCount + 1 ] = '\0'; // 置结束符
return str;
}//=============================下面是测试代码class CTimer
{
public:
__forceinline CTimer( void )
{
QueryPerformanceFrequency( &m_Frequency );
QueryPerformanceCounter( &m_StartCount );
}
__forceinline void Reset( void )
{
QueryPerformanceCounter( &m_StartCount );
}
__forceinline double End( void )
{
static __int64 nCurCount;
QueryPerformanceCounter( (PLARGE_INTEGER)&nCurCount );
return double( nCurCount - ( *(__int64*)&m_StartCount ) ) / double( *(__int64*)&m_Frequency );
}
private:
LARGE_INTEGER m_Frequency;
LARGE_INTEGER m_StartCount;
};int _tmain(int argc, _TCHAR* argv[])
{
Sleep( 2000 );
char szBuf[10];
CTimer t;
for ( int i = -10000000; i < 10000000; i++ )
{
myitoa10( i, szBuf );
}
cout << t.End() << endl; t.Reset();
for ( int i = -10000000; i < 10000000; i++ )
{
itoa( i, szBuf, 10 );
}
cout << t.End() << endl;
system( "pause" );
return 0;
}//========================================输出：3.98709
6.05084
PII出现以前，乘法运算是相当费时的，以至于当时的经典优化方法便是把一类特殊的乘法转变为移位运算和加法。而今，在PII上，乘法和多数其它运算一样，只需要一个指令周期即可完成。
to Snow_Ice11111：你的想法是对的，但这样的代码在处理八位（翻转时四次交换）或以上的数字时才会显出效果。log、pow之类都是调用的CPU高级算术运算指令，效率很底，而变量交换其实CPU执行很快的。因此要看处理什么样的数字，一般来讲用户处理的数字可能不会很大，所以直接翻转可能效率更高些。
hai1039(天下)是的，乘法今天看来已经不足为奇了。汇编指令比较：int _tmain(int argc, _TCHAR* argv[])
{
int a = 22222222, b = 11111111;
Sleep( 2000 );
CTimer t;
for ( int i = 0; i < 100000000; i++ )
{
__asm
{
mov eax, dword ptr[a]
imul eax, dword ptr[b]
}
}
cout << t.End() << endl; t.Reset();
for ( int i = 0; i < 100000000; i++ )
{
__asm
{
mov eax, dword ptr[a]
add eax, dword ptr[b]
}
}
cout << t.End() << endl; t.Reset();
for ( int i = 0; i < 100000000; i++ )
{
__asm
{
mov eax, dword ptr[a]
cdq
idiv dword ptr[b]
}
}
cout << t.End() << endl; system( "pause" );
return 0;
}可以看出imul和add旗鼓相当，甚至imul比add还快一点点，但是除法还是相当慢啊！
总之我没有测试，效率的就不多说了。给楼主一些建议吧，我曾经写过memcpy的代码，当时也比runtime的快。
但后来经过多种优化和测试，在不同的CPU上（考虑流水线优化），单线多线等。
最终发现，有的时候会根据系统不同而有差异。再到后来，听过高手的一句话：不要重写C runtime。
想想也是，runtime久经锤炼，确实意义不大，
你所谓的：100万个数的转换只要8.5秒，而VC的itoa要10.2秒
对于普通用户来说，根本不需要转换那么多数。当然，挑战一下也是好事情：）
to madmanahong：你写的memcpy比VC的快？要知道VC的memcpy可是汇编实现的，用的是stos指令。不知你的算法如何实现？
对于基础算法而言，多线程是有弊无益的。
我没有重写VC程序库，也不可能，只是想借这个机会搞搞代码优化比拼而己。
算法总是要讲求效率的，程序员永远不可能知道用户会怎样去恶搞他的程序。
just for fun, don't think to much :-)here is time in my machine in average:ms time: 1.682000
my time: 1.702000
my2 time: 1.783000
---------------------------------
I just modified a little about test code:int main(int argc, char* argv[])
{

     char buf[32];
     int i;
clock_t start, end; // avoid code delay
     for(i=0; i<100000; i++)
     {
xtoa((unsigned long)(unsigned int)i, buf, 10, 0);
         myitoa(i, buf, 10);
         myitoa10(i, buf);
     } start = clock();
     for(i=0; i<10000000; i++)
     {
xtoa((unsigned long)(unsigned int)i, buf, 10, 0);
     }
     end = clock();
     printf("ms time: %f\n", (end-start) * 1.0/CLOCKS_PER_SEC);

     start = clock();
     for(i=0; i<10000000; i++)
     {
         myitoa(i, buf, 10);
     }
     end = clock();
     printf("my time: %f\n", (end-start) * 1.0/CLOCKS_PER_SEC);     start = clock();
     for(i=0; i<10000000; i++)
     {
         myitoa10(i, buf);
     }
     end = clock();
     printf("my2 time: %f\n", (end-start) * 1.0/CLOCKS_PER_SEC);     getch();
     return 0;
}
Anyway, using pointer directly is faster than using array.following is your string reverse:
---------------------
while( nIndex-- > 0 ) { // 将字符串的字符序翻转
nTemp = pStr[ nIndex ];
pStr[ nIndex ] = pStr[ nCount - nIndex ];
pStr[ nCount - nIndex ] = nTemp;
}
pStr[ nCount + 1 ] = '\0'; // 置结束符
---------------------
here is ms string reverse:
---------------------
        *p-- = '\0';            /* terminate string; p points to last digit */        do {
            temp = *p;
            *p = *firstdig;
            *firstdig = temp;   /* swap *p and *firstdig */
            --p;
            ++firstdig;         /* advance to next two digits */
        } while (firstdig < p); /* repeat until halfway */
---------------------
Although there is only million-seconds different, it is indeed affected if you run multi-times.
sclzmbie你的测试结果是极不可信的。有打开优化么？呵呵。另外，我没有用指针是因为指针对内存的索引花费的时间比直接用栈内存的时间长。
// 转换结束后字符串是翻的
nIndex = ( nCount + 1 ) / 2; // 计算出一半的长度
while( nIndex-- > 0 ) { // 将字符串的字符序翻转
nTemp = pStr[ nIndex ];
pStr[ nIndex ] = pStr[ nCount - nIndex ];
pStr[ nCount - nIndex ] = nTemp;
}
}
pStr[ nCount + 1 ] = '\0'; // 置结束符
可以用strrev（pStr）
to mgdcs：
在算法中调用函数是对性能要求极高的算法的大忌，微软的程序员也深知这一点，因此他们极会使用递归，凡是递归的算法都用循环+stack来实现了。
谢谢你的提醒，
长见识了，
不过我觉得你的程序还有地方可以改进，比如：
static char szMap[] = {
'0', '1', '2', '3', '4', '5',
'6', '7', '8', '9', 'a', 'b',
'c', 'd', 'e', 'f', 'g', 'h',
'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't',
'u', 'v', 'w', 'x', 'y', 'z'
};// 字符映射表
可以不要再把pStr[ ++nCount ] = szMap[ nValue % radix ];
改成：pStr[ ++nCount ] = (（nValue % radix）>9?(nValue % radix+'a'):(nValue % radix -'0'));
打错了，应该是：pStr[ ++nCount ] = (（nValue % radix）>9?(nValue % radix-10+'a'):(nValue % radix +'0'));
我也写过一个
char* delos::itoa(int n, int radix){
char* p;
int minus;
static char buf[36]; p = &buf[36];
*--p = '\0'; if (n < 0){
minus = 1;
   n = -n;
}
else
minus = 0; if (n == 0)
*--p = '0';
else
while (n>0) {
    *--p = "0123456789ABCDEF"[n % radix];
    n /= radix;
   }
if (minus)
*--p = '-';
return p;
}