如何去掉内容中的HTML标签,包括标签中的内嵌标签和属性,只留下文字内容,如:
<SPAN lang=EN><FONT face="Times New Roman">测试文字</FONT></SPAN>
过滤掉只留下"测试文字",这是从数据库中读取来的,内容包括这些标签的.请高手救我,比较急.谢谢了!
<SPAN lang=EN><FONT face="Times New Roman">测试文字</FONT></SPAN>
过滤掉只留下"测试文字",这是从数据库中读取来的,内容包括这些标签的.请高手救我,比较急.谢谢了!
bool skip = false;
foreach(char ch in str)
{
if(ch == '<')
skip = true;
else if(ch == '>')
skip = false;
else
if(!skip)
sb.Append(ch);
}
return sb.ToString();
string html = "<SPAN lang=EN> <FONT face=\"Times New Roman\">测试文字 </FONT> </SPAN> ";
Regex reg = new Regex("<.*?>", RegexOptions.Compiled);
string newString = reg.Replace(html, string.Empty);
或是
id.innerText
function nohtml(str)
dim re
Set re=new RegExp
re.IgnoreCase =true
re.Global=True
re.Pattern="(\<.[^\<]*\>)"
str=re.replace(str," ")
re.Pattern="(\<\/[^\<]*\>)"
str=re.replace(str," ")
nohtml=str
set re=nothing
end function上面是ASP的,
Regex reg = new Regex(@" <[^>]*>", RegexOptions.Compiled);
string newString = reg.Replace(this.TextBox1.Text, string.Empty);
Response.Write(newString);
Regex reg = new Regex(@"<\/*[^<>]*>", RegexOptions.IgnoreCase);
string newString = reg.Replace(html, string.Empty);