html中包含一下标签:
<TR>
<TD><BR>
<TABLE cellSpacing=0 cellPadding=6 width=300 align=left
border=0>
<TBODY>
<TR>
<TD>
<SCRIPT language=javascript src=""></SCRIPT>
<NOSCRIPT language=javascript><A
href="http://ads.addynamix.com/click/2-2125147-2"><IMG
src="" border=0></A></NOSCRIPT> </TD>
</TR></TBODY></TABLE>I'm
dreaming tonight <BR>I'm living back home <BR>Right <BR>Yeah
<BR><BR>Take me back to a south Tallahassee <BR>Down cross the
bridge to my sweet sassafrassy <BR>Can't stand up on my feet
in the city <BR>Gotta get back to the real nitty gritty
<BR><BR>Yes sir no sir <BR>Don't come close to my <BR>Home
sweet home <BR>Can't catch no dose <BR>Of my hot tail poon
tang sweetheart <BR>Sweathog ready to make a silk purse
<BR>From a J Paul Getty and his ear <BR>With her face in her
beer <BR><BR>Home sweet home <BR><BR>Get out in the field
<BR>Put the mule in the stable <BR>Ma she's a cookin' <BR>Put
the eats on the table <BR>Hate's in the city <BR>And my love's
in the meadow <BR>Hands on the plow <BR>And my feets in the
ghetto <BR><BR>Stand up sit down <BR>Don't do nothin' <BR>It
ain't no good when boss man's <BR>Stuffin' down their throats
<BR>For paper notes <BR>And their babies cry <BR>While cities
lie at their feet <BR>When you're rockin' the street
<BR><BR>Home sweet home <BR><BR>Mama take me home sweet home
<BR><BR>I was the last child <BR>I'm just a punk in the street
<BR><BR>I was the last child <BR>I'm just a punk in the street
<BR><BR>I was the last child <BR>I'm just a punk in the street
<BR><BR>I was the last child <BR>I'm just a punk in the street
<BR><BR>I was the last child <BR>I'm just a punk in the street
<BR><BR>I was the last child <BR>I'm just a punk in the street
<BR></TD>
</TR>
我想获取红色部分的代码
我用@"(<TR[^>]*?><TD[^>]*?>)(?<conetnet>[^<]*)(</TD></TR>)"获取不到,
只有 <TD>
<SCRIPT language=javascript src=""></SCRIPT>
<NOSCRIPT language=javascript><A
href="http://ads.addynamix.com/click/2-2125147-2"><IMG
src="" border=0></A></NOSCRIPT> </TD>
请问怎么修改一下啊??谢谢大家。
<TR>
<TD><BR>
<TABLE cellSpacing=0 cellPadding=6 width=300 align=left
border=0>
<TBODY>
<TR>
<TD>
<SCRIPT language=javascript src=""></SCRIPT>
<NOSCRIPT language=javascript><A
href="http://ads.addynamix.com/click/2-2125147-2"><IMG
src="" border=0></A></NOSCRIPT> </TD>
</TR></TBODY></TABLE>I'm
dreaming tonight <BR>I'm living back home <BR>Right <BR>Yeah
<BR><BR>Take me back to a south Tallahassee <BR>Down cross the
bridge to my sweet sassafrassy <BR>Can't stand up on my feet
in the city <BR>Gotta get back to the real nitty gritty
<BR><BR>Yes sir no sir <BR>Don't come close to my <BR>Home
sweet home <BR>Can't catch no dose <BR>Of my hot tail poon
tang sweetheart <BR>Sweathog ready to make a silk purse
<BR>From a J Paul Getty and his ear <BR>With her face in her
beer <BR><BR>Home sweet home <BR><BR>Get out in the field
<BR>Put the mule in the stable <BR>Ma she's a cookin' <BR>Put
the eats on the table <BR>Hate's in the city <BR>And my love's
in the meadow <BR>Hands on the plow <BR>And my feets in the
ghetto <BR><BR>Stand up sit down <BR>Don't do nothin' <BR>It
ain't no good when boss man's <BR>Stuffin' down their throats
<BR>For paper notes <BR>And their babies cry <BR>While cities
lie at their feet <BR>When you're rockin' the street
<BR><BR>Home sweet home <BR><BR>Mama take me home sweet home
<BR><BR>I was the last child <BR>I'm just a punk in the street
<BR><BR>I was the last child <BR>I'm just a punk in the street
<BR><BR>I was the last child <BR>I'm just a punk in the street
<BR><BR>I was the last child <BR>I'm just a punk in the street
<BR><BR>I was the last child <BR>I'm just a punk in the street
<BR><BR>I was the last child <BR>I'm just a punk in the street
<BR></TD>
</TR>
我想获取红色部分的代码
我用@"(<TR[^>]*?><TD[^>]*?>)(?<conetnet>[^<]*)(</TD></TR>)"获取不到,
只有 <TD>
<SCRIPT language=javascript src=""></SCRIPT>
<NOSCRIPT language=javascript><A
href="http://ads.addynamix.com/click/2-2125147-2"><IMG
src="" border=0></A></NOSCRIPT> </TD>
请问怎么修改一下啊??谢谢大家。
<td[^>]*>(?:(?!<table|<td)[\s\S])*<table[^>]*>(?:(?!</table>)[\s\S])*</table>(?<content>(?:(?!</td>)[\s\S])+)</td>再加上不区分大小写就可以了
string str="<TR><TD> <BR> <TABLE cellSpacing=0 cellPadding=6 width=300 align=left border=0> <TBODY> <TR> <TD> <SCRIPT language=javascript src=\"\"> </SCRIPT> <NOSCRIPT language=javascript> <A href=\"http://ads.addynamix.com/click/2-2125147-2\"><IMG src=\"\" border=0> </A> </NOSCRIPT> </TD> </TR> </TBODY> </TABLE> gdfgsdgsgsd</TD></TR> ";
System.Text.RegularExpressions.Regex reg=new System.Text.RegularExpressions.Regex(@"<TR>\s{0,}<TD>\s{0,}<BR>\s{0,}<TABLE[^>].*?>.*?</TABLE>\s{0,}(?<content>[^>].*?)\s{0,}</TD>\s{0,}</TR>",System.Text.RegularExpressions.RegexOptions.IgnoreCase);
System.Text.RegularExpressions.MatchCollection m = reg.Matches(str); //设定要查找的字符串
for (int i = 0; i < m.Count; i++)
{
Response.Write(m[i].Groups["content"].ToString());
}
用上面那个正则可以提取到..测试过.
更正:
(? <= </TABLE> )[\w\W]+?(?= </TD> ) ,只取到第一个</TD>的位置
C#去除html标签
Regex.Replace(content, "\\<" + @"[\s\S]*?>", "", RegexOptions.IgnoreCase)
一般最好加上singleline和ignorecase