正则表达式

现在我要抓取页面
我想要用正则表达式抓取标签中的属性比如：<a href="http://www.baidu.com" title="百度一下">百度</a>我如何通过正则表达式获取 href值，title值，以及<a>标签的内容急急急，在线等待！！！

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

(?i)<a[^>]*href=(['"]?)(?<href>[^'"]+)\1[^>]*title=\1(?<title>[^'"]+)\1[^>]*>(?<value>[^<]+)</a>
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Globalization;
using System.IO;
using System.Text.RegularExpressions;namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string html = @"：<a href=""http://www.baidu.com"" title=""百度一下"">百度</a>""";
            foreach (Match m in Regex.Matches(html, @"(?i)<(a)\s*[^=]+?=""(?<href>[^""]+)""\s*[^=]+?=""(?<title>[^""]+)"">(?<a>[^>]*?)</\1>"))
            {
                foreach (Capture c in m.Groups["href"].Captures)
                {
                    Console.WriteLine(c.Value);
                }
                Console.WriteLine("*************");
                foreach (Capture c in m.Groups["title"].Captures)
                {
                    Console.WriteLine(c.Value);
                }
                Console.WriteLine("*************");
                foreach (Capture c in m.Groups["a"].Captures)
                {
                    Console.WriteLine(c.Value);
                }
                Console.WriteLine("*************");
            }        }
    }
}
结果
http://www.baidu.com
***********
百度一下
***********
百度
string str = "<a href=\"http://www.baidu.com\" title=\"百度一下\">百度</a>";
        Regex re = new Regex(@"<a\s*href=\""(?<href>.*?)\""\s*title=\""(?<title>.*?)\"">(?<content>.*?)</a>", RegexOptions.None);
        MatchCollection mc = re.Matches(str);
        foreach (Match ma in mc)
        {
            //ma.Groups["href"].Value  这是href，结果：http://www.baidu.com
            //ma.Groups["title"].Value   这是title，结果：百度一下
            //ma.Groups["content"].Value) 这是A标签里的值，结果：百度
        }
void Main()
{
  string html = @"<a href=""http://www.baidu.com"" title=""百度一下"">百度</a>""";
  foreach(Match m in Regex.Matches(html,@"(?i)<a[^>]*href=(['""]?)(?<href>[^'""]+)\1[^>]*title=\1(?<title>[^'""]+)\1[^>]*>(?<value>[^<]+)</a>"))
  {
    Console.WriteLine("href: {0}  title: {1}  value: {2}",m.Groups["href"].Value,m.Groups["title"].Value,m.Groups["value"].Value);
  }
  //href: http://www.baidu.com  title: 百度一下  value: 百度}
凑个热闹！
(?is)(?<=a.*href=).*(?=\s)|(?<=a.*title=).*?(?=[>]|[\s])|(?<=a.*>).*?(?=</)
猫兄，这里的title=\1(?<title>[^'""]+)\1，\1用在这里会不会不合适？
因为有可能<a href='xxx' title=""....
这个可能是存在的
不过可以这样来规避：(?i)<a[^>]*href=(['"]?)(?<href>[^'"]+)\1[^>]*title=(['"]?)(?<title>[^'"]+)\2[^>]*>(?<value>[^<]+)</a>
http://www.jb51.net/article/21853.htm你可以参考一下这个我觉得蛮有用的