有如下文件
<html><head><body>
<table width="100%" border="0" cellspacing="0" cellpadding="0" bgcolor="#CCCCCC">
<td><font color="black" size=1>I l<font color="#FF0000">@</font>ve RuBoard</td>
<td valign="top" class="v2" align="right">
<a href="0201794276_ch02lev1sec3.html"><img src="FILES/previous.gif" width="62" height="15" border="0" align="absmiddle" alt="Previous Section"></a>
<a href="0201794276_ch03lev1sec1.html"><img src="FILES/next.gif" width="41" height="15" border="0" align="absmiddle" alt="Next Section"></a>
</td></table>
<br>
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td valign="top"><A NAME="ch03"></A>
<H2 class="docChapterTitle">Chapter 3. Customer Trust</H2>
<blockquote>
<p class="docText"><span class="docEmphasis">Before you trust a man, eat a peck of salt with him.</span></p><p class="docText">桺roverb</p></blockquote>
<blockquote>
<p class="docText"><span class="docEmphasis">Traditional Web development has always set up customers as the "others." They paid the bills but somehow they were a foreign entity intruding on development. Such segregation has consistently backfired. Where developers wanted privacy they got scrutiny; where they wanted blind faith they got distrust. The walls that divide developers from customers sabotage the whole project. XP offers new practices of inclusion that might be hard to swallow at first but pay off immediately.</span></p></blockquote><ul></ul>
</td>
</tr>
</table>
<td></td>
<table width="100%" border="0" cellspacing="0" cellpadding="0" bgcolor="#CCCCCC">
<td><font color="black" size=1>I l<font color="#FF0000">@</font>ve RuBoard</td>
<td valign="top" class="v2" align="right">
<a href="0201794276_ch02lev1sec3.html"><img src="FILES/previous.gif" width="62" height="15" border="0" align="absmiddle" alt="Previous Section"></a>
<a href="0201794276_ch03lev1sec1.html"><img src="FILES/next.gif" width="41" height="15" border="0" align="absmiddle" alt="Next Section"></a>
</td></table>
</body></html>用正则表达式找到前后两个<table></table>并将其内容删除,包括<table>标签
<html><head><body>
<table width="100%" border="0" cellspacing="0" cellpadding="0" bgcolor="#CCCCCC">
<td><font color="black" size=1>I l<font color="#FF0000">@</font>ve RuBoard</td>
<td valign="top" class="v2" align="right">
<a href="0201794276_ch02lev1sec3.html"><img src="FILES/previous.gif" width="62" height="15" border="0" align="absmiddle" alt="Previous Section"></a>
<a href="0201794276_ch03lev1sec1.html"><img src="FILES/next.gif" width="41" height="15" border="0" align="absmiddle" alt="Next Section"></a>
</td></table>
<br>
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td valign="top"><A NAME="ch03"></A>
<H2 class="docChapterTitle">Chapter 3. Customer Trust</H2>
<blockquote>
<p class="docText"><span class="docEmphasis">Before you trust a man, eat a peck of salt with him.</span></p><p class="docText">桺roverb</p></blockquote>
<blockquote>
<p class="docText"><span class="docEmphasis">Traditional Web development has always set up customers as the "others." They paid the bills but somehow they were a foreign entity intruding on development. Such segregation has consistently backfired. Where developers wanted privacy they got scrutiny; where they wanted blind faith they got distrust. The walls that divide developers from customers sabotage the whole project. XP offers new practices of inclusion that might be hard to swallow at first but pay off immediately.</span></p></blockquote><ul></ul>
</td>
</tr>
</table>
<td></td>
<table width="100%" border="0" cellspacing="0" cellpadding="0" bgcolor="#CCCCCC">
<td><font color="black" size=1>I l<font color="#FF0000">@</font>ve RuBoard</td>
<td valign="top" class="v2" align="right">
<a href="0201794276_ch02lev1sec3.html"><img src="FILES/previous.gif" width="62" height="15" border="0" align="absmiddle" alt="Previous Section"></a>
<a href="0201794276_ch03lev1sec1.html"><img src="FILES/next.gif" width="41" height="15" border="0" align="absmiddle" alt="Next Section"></a>
</td></table>
</body></html>用正则表达式找到前后两个<table></table>并将其内容删除,包括<table>标签
解决方案 »
- 请教哈希表操作数据库的问题
- 高分请教ClickOnce 客户端出现“无法继续,此应用程序格式不正确”
- combox动态绑定
- 怎么判断一个Treeview中是否有节点选中?
- C#如何给二维数组循环赋值,如图
- 如何获得datagridviewComboboxCell的选中项的value和text值?
- 求助:如何在绘图中,实现毛笔的绘图效果,笔画有粗细,笔锋.
- 大家来做题呀!
- 高分,用LINQ实现这样的查询
- 我不明白,既然已经有数据库,要Dataset的那些Relation在内存中建表结构有什么YONG
- 在日文版visual studio2003里显示中文?
- .net2.0 的程序必须运行在安装了.net2.0的机器上吗? .net1.0和1.1 的行不行为什么?
string Pattern = @"<table(.*?)</table>";
RegexOptions _Option = RegexOptions.Singleline ;
Regex _REG = new Regex(Pattern,_Option ); while(_REG.IsMatch(ReturnValue))
{
ReturnValue = _REG.Replace(ReturnValue,"");
} MessageBox.Show(ReturnValue);
第一步:
得到中间的那个表的内容
SubjectString = ""; // 源字符串
try {
Regex RegexObj = new Regex("(?s)<table.*?>((?!table).)*</table>");
Match MatchResults = RegexObj.Match(SubjectString);
while (MatchResults.Success) {
MatchResults = MatchResults.NextMatch();
// 这里会循环取得三个表的内容, 你只取第二个表的内容
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}然后用第二个标的内容代替源字符串中<body> ... </body>中间的内容
代替方法如下:
string ResultString = null;
try {
ResultString = Regex.Replace(SubjectString, "(?s)(?<=<body.*?>).*(?=</body>)", table2);
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}table2及为第二个标的内容
string Pattern = @"<table(.*?)</table>";
Regex reg = new Regex(@"<table.*?</table>",RegexOptions.Singleline);
string output = reg.Replace(input,"");
string Pattern = @"<table(.*?)</table>";//这句去掉
Regex reg = new Regex(@"<table.*?</table>",RegexOptions.Singleline);
string output = reg.Replace(input,"");
这样就可以了:
string input="那一大段代码";
Regex reg = new Regex(@"<table.*?</table>",RegexOptions.Singleline);
string output = reg.Replace(input,"");
人家可是要保留第二个表的内容啊
Regex reg = new Regex(@"<table.*?</table>(.*?<table.*?</table>.*?)<table.*?</table>",RegexOptions.Singleline);
string output=reg.Replace(input,"$1");
这回可以符合楼主的要求了。。
<html><head><META http-equiv="Content-Type" content="text/html"><!--SafClassName="docChapterTitle"--><!--SafTocEntry="Chapter 2. Project Estimating"--><link rel="STYLESHEET" type="text/css" href="FILES/style.css"><link rel="STYLESHEET" type="text/css" href="FILES/docsafari.css"></head><body></body></html>
我就是想去掉开始和结束的表中的内容,又没有更简单的方法
就是要把文件特定的table去掉这个表的特点是包含next.gif这个字符串,是顶层table,并且里面没有嵌套table
请高手支招
try {
ResultString = Regex.Replace(SubjectString, "(?s)<table[\\w|=|\"|%|\\s|#]*? bgcolor=\"#CCCCCC\">((?!table).)*</table>", "");
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
string ResultString = null;
try {
ResultString = Regex.Replace(SubjectString, "(?s)<table((?!>).)*>((?!table).)*next\\.gif((?!table).)*</table>", "");
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}