本人想用ASP.NET从新浪上偷一个div里面的数据,哪位大虾帮个忙啊 RT,http://finance.sina.com.hk/stock/foreignIndices.html,要求取得这个网页的那几个数据,在我的网页里显示出来,到底该怎么写啊,我现在只能用 request和response取得所有的html代码,接下来就不知道该怎么办了 解决方案 » 免费领取超大流量手机卡,每月29元包185G流量+100分钟通话, 中国电信官方发货 得到html代码后利用正则去匹配就可以,所以说吧,你想要得到哪个div 就是<div id="foreignIndicesAll">...</div>里面的内容 我现在的代码是这样的: string str = string.Empty; try { WebRequest request = WebRequest.Create("http://finance.sina.com.hk/stock/foreignIndices.html"); WebResponse response = request.GetResponse(); Stream stream = response.GetResponseStream(); StreamReader reader = new StreamReader(stream, System.Text.Encoding.GetEncoding("GB2312")); str = reader.ReadToEnd(); reader.Close(); reader.Dispose(); response.Close(); tb.Text = str; } catch (Exception ex) { str = ex.Message; }该怎么加这个正则表达式啊 <div id="foreignIndicesAll">[\s\S]*?</div> 获取的时候注意编码string url = "http://finance.sina.com.hk/stock/foreignIndices.html"; WebRequest request = WebRequest.Create(url); //请求url WebResponse response = request.GetResponse(); //获取url数据 StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding("big5")); string tempStr = reader.ReadToEnd(); string id="foreignIndicesAll"; //string tempStr = File.ReadAllText(@"C:\Documents and Settings\Administrator\桌面\Test.txt", Encoding.GetEncoding("GB2312"));//读取txt string pattern = @"(?isx)<div(?:(?!(?:id=|</?div\b)).)*id=(['""]?)" + id + @"\1[^>]*>(?><div[^>]*> (?<Open>)|</div> (?<-Open>)|(?:(?!</?div\b).)*)*(?(Open)(?!))</div>"; tempStr = Regex.Match(tempStr,pattern).Value; 你要获取的DIV是有嵌套的.不能简单的用<div[^>]*?></div>的方式 Regex regBody = new Regex(@"<div id="foreignIndicesAll">[\s\S]*?</div>");这么写报错,那该怎么写啊 Regex regBody = new Regex("<div id=\"foreignIndicesAll\">[\\s\\S]*?</div>"); 上面不是有了,另外页面编码是 Encoding.GetEncoding("big5") 而不是你的GB2312string pattern = @"(?isx)<div(?:(?!(?:id=|</?div\b)).)*id=(['""]?)foreignIndicesAll\1[^>]*>(?><div[^>]*> (?<Open>)|</div> (?<-Open>)|(?:(?!</?div\b).)*)*(?(Open)(?!))</div>"; @Return_false,刚刷新看到信息,先谢谢这位大哥了 嗯,刷新出来了,谢谢Return_false打个,我还想问问的是,用XMLHttpRequest这类办法可以实现吗?我抓取页面的时候就有拒绝访问,但是我看到很多小偷程序都是用这种方法的 string Url = "http://finance.sina.com.hk/stock/foreignIndices.html"; //构造httpwebrequest对象,注意,这里要用Create而不是new HttpWebRequest wReq = (HttpWebRequest)WebRequest.Create(Url); //定义代理.//如果是通过代理上网的则需要设定. //WebProxy proxy = new WebProxy("proxyServer:intPort", true); //proxy.Credentials = new NetworkCredential("UserName", "UserPwd", "UserDomain"); ////如果是通过代理上网的则需要设定. //wReq.Proxy = proxy; //伪造浏览器数据,避免被防采集程序过滤 wReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322; .NET CLR 2.0.50215; CrazyCoder.cn;www.aub.org.cn)"; //注意,为了更全面,可以加上如下一行,避开ASP常用的POST检查 wReq.Referer =Url;//指明来源网页,要采集页面的主页 //定义httpResponse. HttpWebResponse wResp = wReq.GetResponse() as HttpWebResponse; //定义输出流 System.IO.Stream respStream = wResp.GetResponseStream(); System.IO.StreamReader reader = new System.IO.StreamReader(respStream, Encoding.GetEncoding("big5")); string tempStr = reader.ReadToEnd(); //close can dispose some resource. reader.Close(); reader.Dispose(); 我怎么现在才能看到啊,o(︶︿︶)o 唉,这网络,求一个关于学习ASP.NET 比较全面点的网站啊什么的,我这个半调子水准,简单东西还行,稍微麻烦点就趴了,真的谢谢Return_false这位大哥了,不知道我可以多给你点分不 谁用过PayPal支付 asp.net写投票系统ip地址用数据库限制的例子 关于GridView选取单元格 怎么用验证控件 验证输入的长度过长 Asp.net2 一个验证问题,望指教 想调查一下大家用小偷程序的多不多?你们都主要是用来干吗? 谁研究过xheditor这个编辑器? 关于亚马逊url解析问题 正则表达式的问题 在asp.net+sql server中,请问大家是如何处理附件的 想要采集一个网站的新闻 但是正则不会写,求高手帮助 甘特图问题
说吧,你想要得到哪个div
string str = string.Empty;
try
{
WebRequest request = WebRequest.Create("http://finance.sina.com.hk/stock/foreignIndices.html");
WebResponse response = request.GetResponse();
Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream, System.Text.Encoding.GetEncoding("GB2312"));
str = reader.ReadToEnd();
reader.Close();
reader.Dispose();
response.Close();
tb.Text = str;
}
catch (Exception ex)
{
str = ex.Message;
}
该怎么加这个正则表达式啊
string url = "http://finance.sina.com.hk/stock/foreignIndices.html";
WebRequest request = WebRequest.Create(url); //请求url
WebResponse response = request.GetResponse(); //获取url数据 StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding("big5"));
string tempStr = reader.ReadToEnd(); string id="foreignIndicesAll";
//string tempStr = File.ReadAllText(@"C:\Documents and Settings\Administrator\桌面\Test.txt", Encoding.GetEncoding("GB2312"));//读取txt
string pattern = @"(?isx)<div(?:(?!(?:id=|</?div\b)).)*id=(['""]?)" + id + @"\1[^>]*>(?><div[^>]*> (?<Open>)|</div> (?<-Open>)|(?:(?!</?div\b).)*)*(?(Open)(?!))</div>";
tempStr = Regex.Match(tempStr,pattern).Value;
string pattern = @"(?isx)<div(?:(?!(?:id=|</?div\b)).)*id=(['""]?)foreignIndicesAll\1[^>]*>(?><div[^>]*> (?<Open>)|</div> (?<-Open>)|(?:(?!</?div\b).)*)*(?(Open)(?!))</div>";
string Url = "http://finance.sina.com.hk/stock/foreignIndices.html";
//构造httpwebrequest对象,注意,这里要用Create而不是new
HttpWebRequest wReq = (HttpWebRequest)WebRequest.Create(Url);
//定义代理.//如果是通过代理上网的则需要设定.
//WebProxy proxy = new WebProxy("proxyServer:intPort", true);
//proxy.Credentials = new NetworkCredential("UserName", "UserPwd", "UserDomain"); ////如果是通过代理上网的则需要设定.
//wReq.Proxy = proxy; //伪造浏览器数据,避免被防采集程序过滤
wReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322; .NET CLR 2.0.50215; CrazyCoder.cn;www.aub.org.cn)"; //注意,为了更全面,可以加上如下一行,避开ASP常用的POST检查
wReq.Referer =Url;//指明来源网页,要采集页面的主页 //定义httpResponse.
HttpWebResponse wResp = wReq.GetResponse() as HttpWebResponse; //定义输出流
System.IO.Stream respStream = wResp.GetResponseStream(); System.IO.StreamReader reader = new System.IO.StreamReader(respStream, Encoding.GetEncoding("big5"));
string tempStr = reader.ReadToEnd(); //close can dispose some resource.
reader.Close();
reader.Dispose();