怎么从读取HTML页面的源码并且从特定位置读取内容

第一个问题是：C#如何读取HTML网页的源文件。
第二个问题：如果HTML源码中有一部分如下所示，我想要把<h1>森林蜘蛛</h1>中的“森林蜘蛛”提取出来付给一个变量，应该怎么做啊
<div class="h1">
<h1>森林蜘蛛</h1>
<div class="clearfix"></div>
</div>

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

类似于这样的正则表达式：
(?<=<div class="h1">\s*<h1>)\w*(?=</h1>)
有其它要求再加条件:)
http://hi.baidu.com/obuxiangnizou/blog/item/a28867138ee409c2c3fd78db.html
// 通过节点名称以及节点的一个属性值在源码中过滤出相应的信息
public NodeList getNodeListByAttribute(string nodeName, string attributeName, string attributeValue, Parser paser)
{
NodeFilter nodeFilter = new TagNameFilter(nodeName);
NodeFilter nameFilter = new HasAttributeFilter(attributeName, attributeValue);
AndFilter andFilter = new AndFilter(nodeFilter, nameFilter);
NodeList nodeList = paser.ExtractAllNodesThatMatch(andFilter);
return nodeList;
}
// urlParam 网址，返回网页源代码
public string GetStringByResponse(string urlParam, string coding)
{
if (coding.Length == 0)   // 如果传过来的编码类型为空  则默认为  UTF-8
coding = "UTF-8";
WebRequest request;   // 实例一个请求对象
try
{
request = WebRequest.Create(urlParam);  // 创建一个参数为下载页面的请求
}
catch (UriFormatException)
{
request = null;
}
request.Credentials = CredentialCache.DefaultCredentials; // 若服务器需要，则设置凭据   HttpWebResponse response = null;     // 设置响应为null
Stream dataStream = null;  // 设置服务器端的响应流为null
StreamReader reader = null;
string responseFromServer = null;  // 从服务器中获得响应的字符串 try
{
response = (HttpWebResponse)request.GetResponse();  // 获得响应
if (response.StatusCode == HttpStatusCode.OK)  // 请求成功
{
dataStream = response.GetResponseStream();   // 从响应中获得响应的流信息
reader = new StreamReader(dataStream, Encoding.GetEncoding(coding));
responseFromServer = reader.ReadToEnd();   // 从streamReader中读出响应流string，得出相应的具体内容
}
}
catch (WebException)
{
return null;
}
finally
{
if (reader != null)
reader.Close();
if (dataStream != null)
dataStream.Close();
if (response != null)
response.Close();
} return responseFromServer; ;
}
对了 getNodeListByAttribute()这个方法需要引用一个DLL文件  我的下载资源里面有
如果是自己写的html的话就不用GetStringByResponse() 用IO流去读这个文件就行然后以字符串形式返回html的源码
看不懂的话  Q联系   我Q 464582858
/// <summary>
        /// 根据链接地址获取 Html文本
        /// </summary>
        public static string Get_Html(string Url)
        {
            System.Net.WebClient wc = new System.Net.WebClient();
            try
            {
                Byte[] pageData = wc.DownloadData(Url);
                wc.Credentials = System.Net.CredentialCache.DefaultCredentials;
                wc.Dispose();
                return System.Text.Encoding.Default.GetString(pageData);
            }
            catch (Exception ex)
            {
            }
            finally
            {
            }
            return "";
        }
你的div是可以按照xml来很好地处理的。如果你所说的所谓html，其实是正确的xml数据的话，就用xml工具来处理。