想把所有html的标记清除,故使用:<(?<temp>[^<>]*?)>
进行匹配,并把匹配结果删除但遇到这种情况,例如:<img src="http://www.li20.net/attachments/month_0511/1_kGDDfoOuetcQ.jpg" border="0" onload="if(this.width>screen.width*0.7) {this.resized=true; this.width=screen.width*0.7; this.alt='Click here to open new window\nCTRL+Mouse wheel to zoom in/out';}" onmouseover="if(this.width>screen.width*0.7) {this.resized=true; this.width=screen.width*0.7; this.style.cursor='hand'; this.alt='Click here to open new window\nCTRL+Mouse wheel to zoom in/out';}" onclick="if(!this.resized) {return true;} else {window.open('http://www.li20.net/attachments/month_0511/1_kGDDfoOuetcQ.jpg');}" onmousewheel="return imgzoom(this);">其中一些有非html意义的<>(运算符),这样就无法实现需求,出现的匹配结果可能成为:<img src="http://www.li20.net/attachments/month_0511/1_kGDDfoOuetcQ.jpg" border="0" onload="if(this.width>而不是全部
请问如何把非html意义的<>过滤,类似以上的情况各种网站和页面都会出现,所以需要有个通用的办法,谢谢

解决方案 »

  1.   

    <%@Page Language="c#" Debug="true"%>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="gb2312" lang="gb2312">
    <head>
    <title> New Document </title>
    <meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
    <meta name="title" content="" />
    <meta name="subject" content="" />
    <meta name="language" content="gb2312" />
    <meta name="keywords" content="" />
    <meta name="robots" content="all" />
    <script langauge="c#" runat="server">
       
    void Page_Load(object o , EventArgs e)
    {

    if(!Page.IsPostBack)
    {
    string s = @"<span class='lframe-t-text'>今日<font color='red'>热门新闻</font></span><div>adfadf</div>";
    s = System.Text.RegularExpressions.Regex.Replace(s,"<[^>]+>","");
    Response.Write(s);
    }

    }</script>
    </head>
    <body>
    <form id="frm" runat="server"></form>
    </body>
    </html>