http://www.mndsoft.com/blog/article.asp?id=605有网友要GBK转到UTF8的代码,暂时没有找到,提供一个UTF-8转换GB2312的,对照学习一下吧 '*/------------------------------------------------------------- '*/模 块 名:mUTF8 '*/功 能:UTF-8转换GB2312函数 '*/示 例:UTF2GB("%E9%83%BD%E5%B8%82%E6%83%85%E7%B7%A3 %E6%98%9F%E5%BA%A7") '*/建立日期:2004-11 '*/修改日期: '*/作 者:参阅网上资料 '*/联 系:[email protected] Http://www.mndsoft.com '*/------------------------------------------------------------- Public Function UTF2GB(UTFStr As String) As String For Dig = 1 To Len(UTFStr) If Mid(UTFStr, Dig, 1) = "%" Then If Len(UTFStr) >= Dig + 8 Then GBStr = GBStr & ConvChinese(Mid(UTFStr, Dig, 9)) Dig = Dig + 8 Else GBStr = GBStr & Mid(UTFStr, Dig, 1) End If Else GBStr = GBStr & Mid(UTFStr, Dig, 1) End If Next UTF2GB = GBStr End FunctionPublic Function ConvChinese(X) A = Split(Mid(X, 2), "%") i = 0 j = 0For i = 0 To UBound(A) A(i) = c16to2(A(i)) NextFor i = 0 To UBound(A) - 1 DigS = InStr(A(i), "0") Unicode = "" For j = 1 To DigS - 1 If j = 1 Then A(i) = right(A(i), Len(A(i)) - DigS) Unicode = Unicode & A(i) Else i = i + 1 A(i) = right(A(i), Len(A(i)) - 2) Unicode = Unicode & A(i) End If NextIf Len(c2to16(Unicode)) = 4 Then ConvChinese = ConvChinese & ChrW(Int("&H" & c2to16(Unicode))) Else ConvChinese = ConvChinese & Chr(Int("&H" & c2to16(Unicode))) End If Next End FunctionPublic Function c2to16(X) i = 1 For i = 1 To Len(X) Step 4 c2to16 = c2to16 & Hex(c2to10(Mid(X, i, 4))) Next End FunctionPublic Function c2to10(X) c2to10 = 0 If X = "0" Then Exit Function i = 0 For i = 0 To Len(X) - 1 If Mid(X, Len(X) - i, 1) = "1" Then c2to10 = c2to10 + 2 ^ (i) Next End FunctionPublic Function c16to2(X) i = 0 For i = 1 To Len(Trim(X)) TempStr = c10to2(CInt(Int("&h" & Mid(X, i, 1)))) Do While Len(TempStr) < 4 TempStr = "0" & TempStr Loop c16to2 = c16to2 & TempStr Next End FunctionPublic Function c10to2(X) mysign = Sgn(X) X = Abs(X) DigS = 1 Do If X < 2 ^ DigS Then Exit Do Else DigS = DigS + 1 End If Loop tempnum = Xi = 0 For i = DigS To 1 Step -1 If tempnum >= 2 ^ (i - 1) Then tempnum = tempnum - 2 ^ (i - 1) c10to2 = c10to2 & "1" Else c10to2 = c10to2 & "0" End If Next If mysign = -1 Then c10to2 = "-" & c10to2 End Function
似乎还是不行,我的vb script中文正则表达式:regTest.Pattern = "^[\u4E00-\u9FA5]+$" 读utf-8文件进行判断如下: Open tempstring For Input As #2 Do While Not EOF(2) Line Input #2, temp temp = UTF2GB(temp) If regTest.Test(temp) Then Cells(j, COL_FILE) = tempstring Cells(j, COL_ERROR) = "include chinese!" End If Loop Close #2 请大家看看错在那里,就是判不出中文
temp = UTF2GB(StrConv(temp, vbFromUnicode))这样子先转回ansi在转gb似乎也不起作用 另外: Open tempstring For Binary As #1 ReDim byteTemp(FileLen(tempstring)) As Byte
Get #1, , byteTemp
temp = StrConv(byteTemp, vbFromUnicode) temp = UTF2GB(temp) If regTest.Test(temp) Then Cells(j, COL_FILE) = tempstring Cells(j, COL_ERROR) = "include chinese!" End If Close #1 以上这段代码用二进制方式读取,但是也无法判断出中文,请问是不是我的正则表达是有问题?
'*/-------------------------------------------------------------
'*/模 块 名:mUTF8
'*/功 能:UTF-8转换GB2312函数
'*/示 例:UTF2GB("%E9%83%BD%E5%B8%82%E6%83%85%E7%B7%A3 %E6%98%9F%E5%BA%A7")
'*/建立日期:2004-11
'*/修改日期:
'*/作 者:参阅网上资料
'*/联 系:[email protected] Http://www.mndsoft.com
'*/-------------------------------------------------------------
Public Function UTF2GB(UTFStr As String) As String
For Dig = 1 To Len(UTFStr)
If Mid(UTFStr, Dig, 1) = "%" Then
If Len(UTFStr) >= Dig + 8 Then
GBStr = GBStr & ConvChinese(Mid(UTFStr, Dig, 9))
Dig = Dig + 8
Else
GBStr = GBStr & Mid(UTFStr, Dig, 1)
End If
Else
GBStr = GBStr & Mid(UTFStr, Dig, 1)
End If
Next
UTF2GB = GBStr
End FunctionPublic Function ConvChinese(X)
A = Split(Mid(X, 2), "%")
i = 0
j = 0For i = 0 To UBound(A)
A(i) = c16to2(A(i))
NextFor i = 0 To UBound(A) - 1
DigS = InStr(A(i), "0")
Unicode = ""
For j = 1 To DigS - 1
If j = 1 Then
A(i) = right(A(i), Len(A(i)) - DigS)
Unicode = Unicode & A(i)
Else
i = i + 1
A(i) = right(A(i), Len(A(i)) - 2)
Unicode = Unicode & A(i)
End If
NextIf Len(c2to16(Unicode)) = 4 Then
ConvChinese = ConvChinese & ChrW(Int("&H" & c2to16(Unicode)))
Else
ConvChinese = ConvChinese & Chr(Int("&H" & c2to16(Unicode)))
End If
Next
End FunctionPublic Function c2to16(X)
i = 1
For i = 1 To Len(X) Step 4
c2to16 = c2to16 & Hex(c2to10(Mid(X, i, 4)))
Next
End FunctionPublic Function c2to10(X)
c2to10 = 0
If X = "0" Then Exit Function
i = 0
For i = 0 To Len(X) - 1
If Mid(X, Len(X) - i, 1) = "1" Then c2to10 = c2to10 + 2 ^ (i)
Next
End FunctionPublic Function c16to2(X)
i = 0
For i = 1 To Len(Trim(X))
TempStr = c10to2(CInt(Int("&h" & Mid(X, i, 1))))
Do While Len(TempStr) < 4
TempStr = "0" & TempStr
Loop
c16to2 = c16to2 & TempStr
Next
End FunctionPublic Function c10to2(X)
mysign = Sgn(X)
X = Abs(X)
DigS = 1
Do
If X < 2 ^ DigS Then
Exit Do
Else
DigS = DigS + 1
End If
Loop
tempnum = Xi = 0
For i = DigS To 1 Step -1
If tempnum >= 2 ^ (i - 1) Then
tempnum = tempnum - 2 ^ (i - 1)
c10to2 = c10to2 & "1"
Else
c10to2 = c10to2 & "0"
End If
Next
If mysign = -1 Then c10to2 = "-" & c10to2
End Function
读utf-8文件进行判断如下:
Open tempstring For Input As #2
Do While Not EOF(2)
Line Input #2, temp
temp = UTF2GB(temp)
If regTest.Test(temp) Then
Cells(j, COL_FILE) = tempstring
Cells(j, COL_ERROR) = "include chinese!"
End If
Loop
Close #2
请大家看看错在那里,就是判不出中文
默认的会认为文件格式是Ansi,读到内存会经过一个Ansi到Unicode的自动转换
读非Ansi文本文件最好用二进制方式
另外:
Open tempstring For Binary As #1
ReDim byteTemp(FileLen(tempstring)) As Byte
Get #1, , byteTemp
temp = StrConv(byteTemp, vbFromUnicode)
temp = UTF2GB(temp)
If regTest.Test(temp) Then
Cells(j, COL_FILE) = tempstring
Cells(j, COL_ERROR) = "include chinese!"
End If
Close #1
以上这段代码用二进制方式读取,但是也无法判断出中文,请问是不是我的正则表达是有问题?
读到内存会经过一个Ansi到Unicode的自动转换
但实际上文件不是Ansi,当Ansi来转换,内容已经全乱了
你再怎么转也没有用了二进制方式读出后,就是UTF-8编码
temp = StrConv(byteTemp, vbFromUnicode)
加这一句是什么意思?
二进制方式直接读到字符串就好了
,我直接用regTest.Test(byteTemp)还是无法正确
识别出中文来,我的正则表达是是:regTest.Pattern = "^[\u4E00-\u9FA5]+$"
这样子写的,能否告知为什么还是无法识别中文吗?
是否正则表达是有问题?
谢谢