两个关于正则表达式的问题

      string inputString = "<html>IIII<head><title>塞北的雪</title></head>UUUU<body><table><tr><td><a class='m' href='http://www.csdn.net'>CSDN</a></td><td><a class='m' href='http://blog.csdn.net/precipitant'>塞北的雪</a></td><td><a class='m' href='http://blog.csdn.net/net_lover'>好人</a></td></tr></table>我市一个好人，你是不是好人呢？</body></html>";            StringBuilder sb = new StringBuilder();
            Regex reg = null;
            Match mch = null;            reg = new Regex(@"<\s*?head\s*?>(.*?)</head\s*?>.*?<\s*?body\s*?>(.*?)</body.*?>", RegexOptions.IgnoreCase | RegexOptions.Compiled);
            for (mch = reg.Match(inputString); mch.Success; mch = mch.NextMatch())
            {
                sb.AppendLine("head：" + mch.Groups[1]);
                sb.AppendLine("body：" + mch.Groups[2]);            }
            MessageBox.Show(sb.ToString());

string inputString = "<html>IIII<head style=''><title>塞北的雪</title></head>UUUU<body style=''ssssss><table><tr><td><a class='m' href='http://www.csdn.net'>CSDN</a></td><td><a class='m' href='http://blog.csdn.net/precipitant'>塞北的雪</a></td><td><a class='m' href='http://blog.csdn.net/net_lover'>好人</a></td></tr></table>我市一个好人，你是不是好人呢？</body></html>";            StringBuilder sb = new StringBuilder();
            Regex reg = null;
            Match mch = null;            reg = new Regex(@"<\s*?head(\s+.*?)?>(.*?)</head\s*?>.*?<\s*?body(\s+.*?)?>(.*?)</body.*?>", RegexOptions.IgnoreCase | RegexOptions.Compiled);
            for (mch = reg.Match(inputString); mch.Success; mch = mch.NextMatch())
            {
                sb.AppendLine("attribute of head：" + mch.Groups[1]);
                sb.AppendLine("attribute of body：" + mch.Groups[3]);
                sb.AppendLine("head：" + mch.Groups[2]);
                sb.AppendLine("body：" + mch.Groups[4]);            }
            MessageBox.Show(sb.ToString());

我的HTML不是写在一行中的，匹配不出来

原文可能会是这样：
<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.0//EN" "http://www.wapforum.org/DTD/xhtml-mobile10.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf8" />
<title>AST</title>
</head>
<body>
<div style="background-color:#8796BF">
<img src="logo.gif" alt="ACCESS CHINA TEST SITE" width="212" height="37" />
</div>
<h2 style="font-family:verdana">Welcome to Access Test Repository</h2>
<h5><a href="readme.htm">Read Me</a></h5>
<hr />
<ul>
<li>
<a href="Browser/index.htm">Browser</a></li>
<li>
<a href="email/index.htm">Email</a></li>
<li>
<a href="MMS/index.htm">MMS</a>
</li>
<li>
<a href="Multimedia/index.htm">Multimedia</a></li>
<li>
<a href="DocViewer/index.htm">Doc Viewer</a></li>
<li>
<a href="RSS/index.htm">RSS</a></li>
<li>
<a href="SyncML/index.htm">SyncML</a></li>
</ul>
<hr />
<p>
<small>Copyright 2005-2006 AccessChina Corp. All Rights Reserved Nanjing QA Dept. </small>
</p>
</body>
</html>

就是说有可能有一些页面中标记（head或body）后面有换行，而有一些页面的标记后面没有换行

注释属性，内容，都要考虑，这个东西好麻烦的说最简单的方法就是，把所有的注释全替换掉，所有的非head,body,的html标记全替换掉（就是替换属性），所有的js全替换掉然后在把用正则取贪婪的head 和body即可，然后在把刚所以替换掉的东西，替换回来就可以了这种问题，原先的帖子里有好多的说

这里好像没有我的发言权啊!~
呵呵
就我级别低.还是说句吧..可能对你有用..你以经定位到<head> </head> <body > </body>这里面的内容啦..为何不去写个XML用于取这里面的值啊!~我记得用<call-param name="itemXPath">//head/这就是你想要的内容</call-param>
<call-param name="itemXPath">//table/body/同样的道理</call-param>不知道有没有帮助.

如果用正则的话，就参照
http://topic.csdn.net/t/20061108/11/5141765.html里M2前辈的回复就可以了

private void button3_Click(object sender, EventArgs e)
        {
            string inputString = this.textBox1.Text.Trim().Replace("\r\n","#@$");            StringBuilder sb = new StringBuilder();
            Regex reg = null;
            Match mch = null;            reg = new Regex(@"<\s*?head(\s+.*?)?>(.*?)</head\s*?>.*?<\s*?body(\s+.*?)?>(.*?)</body.*?>", RegexOptions.IgnoreCase | RegexOptions.Compiled);
            for (mch = reg.Match(inputString); mch.Success; mch = mch.NextMatch())
            {
                sb.AppendLine("attribute of head：" + mch.Groups[1]);
                sb.AppendLine("attribute of body：" + mch.Groups[3]);
                sb.AppendLine("head：" + mch.Groups[2]);
                sb.AppendLine("body：" + mch.Groups[4]);            }
            MessageBox.Show(sb.ToString().Replace("#@$","/r/n"));
        }

调试易

两个关于正则表达式的问题

解决方案 »