正则表达式过滤html标点符号，过滤不干净，疑惑！

我用[^a-zA-Z0-9\\s]+来过滤文本中的标点符号为什么过滤了一些还有一些过滤不掉呢？但是我把它从txt中复制放入regextester里匹配了一下都能匹配出来的啊为什么啊？难道是文档的编码有问题？整个程序是这样的：从google中搜索出的结果，把html源代码保留至txt，去除杂乱的标记，留下每个搜索结果的标题和描述，但是到了去标点这块就出错，txt读入是以UTF-8格式的

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

Function stripHTML(strHTML)
'Strips the HTML tags from strHTML Dim objRegExp, strOutput
Set objRegExp = New Regexp objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.Pattern = "<.+?>" 'Replace all HTML tag matches with the empty string
strOutput = objRegExp.Replace(strHTML, "")
'Replace all < and > with < and >
strOutput = Replace(strOutput, "<", "<")
strOutput = Replace(strOutput, ">", ">")
stripHTML = strOutput 'Return the value of strOutput Set objRegExp = Nothing
End Function
兄弟，要睡觉了
宿舍要断电啦
只能这样了
希望对你有帮助哦
^_*