提取html中的链接，正则表达式！

提取提取html中的链接，和<P>段落中的内容，如果能把图片链接的alt提取出来更好，在这里感谢各位！

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

            string inputs = "<div class=\"box_01\"> <a href=\"http://tech.sina.com.cn/digi/dc/2011-05-18/09425539462.shtml\" target=\"_blank\"><img src=\"http://i1.sinaimg.cn/IT/U5311P2T1D5539462F2755DT20110518095231.jpg\" width=\"135\" height=\"85\" alt=\"徕卡昂贵镜头遭遇切片\" /></a><h3><a href=\"http://tech.sina.com.cn/digi/dc/2011-05-18/09425539462.shtml\" target=\"_blank\">徕卡昂贵镜头遭遇切片</a></h3><p>最近，国外某大学的学生为了自己的毕业设计，...</p> </div>";
            string patterns = @"(href|HREF|src|SRC|<p>)={1,}([""'^#][\w\S]*[""'>|</p>])";
                       MatchCollection matches = Regex.Matches(inputs, patterns);
            foreach (Match match in matches)
            {
                Console.WriteLine("type:        {0}", match.Groups[1].Value);
                Console.WriteLine("href:        {0}", match.Groups[2].Value);
                Console.WriteLine("title:       {0}", match.Groups[3].Value);
                Console.WriteLine("Content:     {0}", match.Groups[4].Value);
                Console.WriteLine();
            }
            这是我的源码，inputs就是我的html标签，谢谢。目前就是需要个正确的正则！
string inputs = "<div class=\"box_01\"> <a href=\"http://tech.sina.com.cn/digi/dc/2011-05-18/09425539462.shtml\" target=\"_blank\"><img src=\"http://i1.sinaimg.cn/IT/U5311P2T1D5539462F2755DT20110518095231.jpg\" width=\"135\" height=\"85\" alt=\"徕卡昂贵镜头遭遇切片\" /></a><h3><a href=\"http://tech.sina.com.cn/digi/dc/2011-05-18/09425539462.shtml\" target=\"_blank\">徕卡昂贵镜头遭遇切片</a></h3><p>最近，国外某大学的学生为了自己的毕业设计，...</p> </div>";
            string patterns = @"(?is)(href|src|alt)=+([""'^#][\w\S]*[""'>])";
            MatchCollection matches = Regex.Matches(inputs, patterns);
            foreach (Match match in matches)
            {
                Console.WriteLine("type:        {0}", match.Groups[1].Value);
                Console.WriteLine("href:        {0}", match.Groups[2].Value);
                Console.WriteLine();
            }
"<div class=\"box_01\"> <a href=\"http://tech.sina.com.cn/digi/dc/2011-05-18/09425539462.shtml\" target=\"_blank\"><img src=\"http://i1.sinaimg.cn/IT/U5311P2T1D5539462F2755DT20110518095231.jpg\" width=\"135\" height=\"85\" alt=\"徕卡昂贵镜头遭遇切片\" /></a><h3><a href=\"http://tech.sina.com.cn/digi/dc/2011-05-18/09425539462.shtml\" target=\"_blank\">徕卡昂贵镜头遭遇切片</a></h3><p>最近，国外某大学的学生为了自己的毕业设计，...</p> </div>"有的~~
            string inputs = "<div class=\"box_01\"> <a href=\"http://tech.sina.com.cn/digi/dc/2011-05-18/09425539462.shtml\" target=\"_blank\"><img src=\"http://i1.sinaimg.cn/IT/U5311P2T1D5539462F2755DT20110518095231.jpg\" width=\"135\" height=\"85\" alt=\"徕卡昂贵镜头遭遇切片\" /></a><h3><a href=\"http://tech.sina.com.cn/digi/dc/2011-05-18/09425539462.shtml\" target=\"_blank\">徕卡昂贵镜头遭遇切片</a></h3><p>最近，国外某大学的学生为了自己的毕业设计，...</p> </div>";
            string patterns = @"(?is)((href|src)=(['""])*([^\s]+?)\3)|(<p>(.*?)</p>)|(alt=(['""])*([^\s]+?)\8)";
            MatchCollection matches = Regex.Matches(inputs, patterns);
            foreach (Match match in matches)
            {
                if (!string.IsNullOrEmpty(match.Groups[2].Value))
                {
                    Console.WriteLine("type:\t{0}", match.Groups[2].Value);
                    Console.WriteLine("href|src:\t{0}", match.Groups[4].Value);
                }
                else if (!string.IsNullOrEmpty(match.Groups[5].Value))
                {
                    Console.WriteLine("type:\tp");
                    Console.WriteLine("Content:\t{0}", match.Groups[6].Value);
                }
                else if (!string.IsNullOrEmpty(match.Groups[7].Value))
                {
                    Console.WriteLine("type:\talt");
                    Console.WriteLine("alt:\t{0}", match.Groups[9].Value);
                }
                Console.WriteLine();
            }
/*
type:   href
href|src:       http://tech.sina.com.cn/digi/dc/2011-05-18/09425539462.shtmltype:   src
href|src:       http://i1.sinaimg.cn/IT/U5311P2T1D5539462F2755DT20110518095231.jpgtype:   alt
alt:    徕卡昂贵镜头遭遇切片type:   href
href|src:       http://tech.sina.com.cn/digi/dc/2011-05-18/09425539462.shtmltype:   p
Content:        最近，国外某大学的学生为了自己的毕业设计，...*/
是不是html只是<div></div>这一部分的？
是的话应该很好办
超链接正则：http://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?
<p>应该比较好提取先用<p>.*</p>提取<p>最近，国外某大学的学生为了自己的毕业设计，...</p>  然后将<p>  和</p>直接replace()掉alt也可以用这个方法感觉有点费事  等牛人出高效正则