开发环境:vs2005
语言:c++
问题:怎么用socket获取正确的网页内容:
描述:
我用socket获取网页内容,
socket发送的头如下:
GET http://reg.email.163.com/mailregAll/reg0.jsp?from=126mail HTTP/1.1
Accept: text/html,image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*
Accept-Language: zh-cn
User-Agent:Mozilla/4.0
Host: reg.email.163.com/mailregAll/reg0.jsp?from=126mail:80
Connection: close
服务器返回的内容:
HTTP/1.0 403 Forbidden
Server: squid/2.5.STABLE10
Mime-Version: 1.0
Date: Thu, 14 Jan 2010 10:34:33 GMT
Content-Type: text/html
Content-Length: 1158
Expires: Thu, 14 Jan 2010 10:34:33 GMT
X-Squid-Error: ERR_ACCESS_DENIED 0
X-Cache: MISS from mimg.163.com
Proxy-Connection: close<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=gb2312">
<TITLE>错误:您所请求的网址(URL)无法获取</TITLE>
<STYLE type="text/css"><!--BODY{background-color:#ffffff;font-family:verdana,sans-serif}PRE{font-family:sans-serif}--></STYLE>
</HEAD><BODY>
<H1>错误</H1>
<H2>您所请求的网址(URL)无法获取</H2>
<HR noshade size="1px">
<P>
当尝试读取以下网址(URL)时:
<A HREF="http://reg.email.163.com/mailregAll/reg0.jsp?">http://reg.email.163.com/mailregAll/reg0.jsp?</A>
<P>
发生了下列的错误:
<UL>
<LI>
<STRONG>
Access Denied.
<BR>拒绝访问
</STRONG>
<P>
Access control configuration prevents your request from
being allowed at this time. Please contact your service provider if
you feel this is incorrect.
<BR>
当前的存取控制设定禁止您的请求被接受,
如果您觉得这是错误的,请与您网路服务的提供者联系。
</UL>
</P>
<P>本缓存服务器管理员:<A HREF="mailto:webmaster">webmaster</A>
<BR clear="all">
<HR noshade size="1px">
<ADDRESS>
Generated Thu, 14 Jan 2010 10:34:33 GMT by mimg.163.com (squid/2.5.STABLE10)
</ADDRESS>
</BODY></HTML>
语言:c++
问题:怎么用socket获取正确的网页内容:
描述:
我用socket获取网页内容,
socket发送的头如下:
GET http://reg.email.163.com/mailregAll/reg0.jsp?from=126mail HTTP/1.1
Accept: text/html,image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*
Accept-Language: zh-cn
User-Agent:Mozilla/4.0
Host: reg.email.163.com/mailregAll/reg0.jsp?from=126mail:80
Connection: close
服务器返回的内容:
HTTP/1.0 403 Forbidden
Server: squid/2.5.STABLE10
Mime-Version: 1.0
Date: Thu, 14 Jan 2010 10:34:33 GMT
Content-Type: text/html
Content-Length: 1158
Expires: Thu, 14 Jan 2010 10:34:33 GMT
X-Squid-Error: ERR_ACCESS_DENIED 0
X-Cache: MISS from mimg.163.com
Proxy-Connection: close<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=gb2312">
<TITLE>错误:您所请求的网址(URL)无法获取</TITLE>
<STYLE type="text/css"><!--BODY{background-color:#ffffff;font-family:verdana,sans-serif}PRE{font-family:sans-serif}--></STYLE>
</HEAD><BODY>
<H1>错误</H1>
<H2>您所请求的网址(URL)无法获取</H2>
<HR noshade size="1px">
<P>
当尝试读取以下网址(URL)时:
<A HREF="http://reg.email.163.com/mailregAll/reg0.jsp?">http://reg.email.163.com/mailregAll/reg0.jsp?</A>
<P>
发生了下列的错误:
<UL>
<LI>
<STRONG>
Access Denied.
<BR>拒绝访问
</STRONG>
<P>
Access control configuration prevents your request from
being allowed at this time. Please contact your service provider if
you feel this is incorrect.
<BR>
当前的存取控制设定禁止您的请求被接受,
如果您觉得这是错误的,请与您网路服务的提供者联系。
</UL>
</P>
<P>本缓存服务器管理员:<A HREF="mailto:webmaster">webmaster</A>
<BR clear="all">
<HR noshade size="1px">
<ADDRESS>
Generated Thu, 14 Jan 2010 10:34:33 GMT by mimg.163.com (squid/2.5.STABLE10)
</ADDRESS>
</BODY></HTML>
解决方案 »
- 哪位兄弟给我看看我的这个问题?
- 如何用变量代换方法实现移动平均算法?
- [急]为什么我做的控件在网页中使用鼠标事件没响应?
- 在MFC中如何利用注册表保存和管理软件信息
- 人气暴差! -----尽管如此,还是做个征询活动!希望大家来顶!
- VC6 写的程序在 win95 下运行不了该怎么办?
- 急求教:试图用以下代码遍历所有文档,却提示如下错误,实不知如何解决,请各位看看为何?(Run-Time Check Failure #0 - The value of ESP was
- 哪里有Inside COM+的书籍下载?
- VC6.0创建静态库
- 以下几本书谁知道哪有下载的吗?
- 做虚拟仿真用OpenGL还是Direct3D?
- api 截取,但是却有C000000005错误.
//打开Socket,返回socketId,-1表示失败
//调用示例:int socketId=socket_open("127.0.0.1",80,0);
int CEmailRegDlg::SocketOpen(char * IP,int Port,int Type)
{
SOCKET socketId;
struct sockaddr_in serv_addr;
int status;
socketId=socket(AF_INET,SOCK_STREAM,0);
if((int)socketId<0)
{
// printf("[ERROR]Create a socket failed!\n");
MessageBox(_T("[ERROR]Create a socket failed!"));
return -1;
}
memset(&serv_addr,0,sizeof(serv_addr));
serv_addr.sin_family=AF_INET;
serv_addr.sin_addr.s_addr = inet_addr(IP);
serv_addr.sin_port = htons((USHORT)Port);
status=connect(socketId,(struct sockaddr*)&serv_addr,sizeof(serv_addr));
if(status!=0)
{
//printf("[ERROR]Connecting failed!\n");
MessageBox(_T("[ERROR]Connecting failed!"));
closesocket(socketId);
return -1;
}
return socketId;
}//根据域名解析IP
char * CEmailRegDlg::GetIPByURL(char * url)
{
struct hostent *host;
WSADATA wsaData;
int ret; ret = WSAStartup(0x0202, &wsaData);
if(ret) {
//printf("error in WSAStartup: %d\n", WSAGetLastError());
return 0;
} host = gethostbyname(url);
if(host == NULL) {
//printf("error in gethostbyname: %d\n", WSAGetLastError());
MessageBox(_T("[ERROR]error in gethostbyname!"));
return 0;
} else {
//printf("name: %s\naddrtype; %d\naddrlength: %d\n",
// host->h_name, host->h_addrtype, host->h_length);
//printf("ip address: %s\n",
//inet_ntoa(*(struct in_addr*)host->h_addr_list[0]));
return inet_ntoa(*(struct in_addr*)host->h_addr_list[0]);
}
WSACleanup();
}
void CEmailRegDlg::OnBnClickedButtonreg()
{
// TODO: 在此添加控件通知处理程序代码
//126.com注册页
//http://reg.email.163.com/mailregAll/reg0.jsp?from=126mail
//163.com注册页
//http://reg.email.163.com/mailregAll/reg0.jsp?from=163mail
//tom.com注册页
//http://bjcgi.tom.com/cgi-bin/tom_reg.cgi
int selectReg=0;
selectReg = mEmailSufix.GetCurSel(); //列表的当前选择序号
switch(selectReg)
{
case 0:
regURLSufix=_T("http://reg.email.163.com/mailregAll/reg0.jsp?from=126mail");
regWebsite=_T("reg.email.163.com");
break;
case 1:
regURLSufix=_T("http://reg.email.163.com/mailregAll/reg0.jsp?from=163mail");
regWebsite=_T("http://reg.email.163.com");
break;
case 2:
regURLSufix=_T("http://bjcgi.tom.com/cgi-bin/tom_reg.cgi");
regWebsite=_T("http://bjcgi.tom.com");
break;
}
//regURLSufix=_T("http://www.zencai.com/index.html");
//regWebsite=_T("http://www.zencai.com");
//根据域名解析IP
USES_CONVERSION;
char * str1=GetIPByURL(T2A(regWebsite)); //IP
CString str2;
str2=A2T(str1); //ascii转为unicode
//MessageBox(regWebsite+str2); //open socket
int socketId=SocketOpen(str1,80,0);
//char* regURLSufix2=T2A(regURLSufix);
//regURLSufix2=regURLSufix2. //设置http request head
//regURLSufix=_T("http://www.atomapharm.com");
CString str3=_T("GET ")+regURLSufix+_T(" HTTP/1.1\r\n");
str3 = str3+_T("Accept: text/html,image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*\r\n");
str3=str3+_T("Accept-Language: zh-cn\r\n");
str3=str3+_T("User-Agent:Mozilla/4.0 \r\n");
regURLSufix.Replace( _T("http://"),_T("") );
//regURLSufix=_T("reg.email.163.com");
str3=str3+_T("Host: ")+regURLSufix+_T(":80\r\n");
str3=str3+_T("Connection: close\r\n\r\n");
SetDlgItemText(IDC_EDITHTTPRequestHead, str3);
char* protocolHead=T2A(str3); //这里使用GET来获取指定URL的指定文档。
// 建立连接后使用send将这些数据发送出去
send(socketId, protocolHead,strlen(protocolHead),0);
//发送完成HTTP请求后就等待接收数据 //这里采用select循环查询的方式来判断有无数据到来
struct timeval tm = {100,1000};
fd_set fds_r; //fd_set被表示成一个套接口的队列
int status;
char recvBuf[4096]={'\0'};
FD_ZERO(&fds_r); //将fds_r初始化为空集NULL
FD_SET(socketId,&fds_r); //向集合fds_r添加描述字socketId
//select函数
//可以用于调查一个或多个SOCKET的状态.
//[声明]
//int select ( int nfds , fd_set FAR *readfds , fd_set FAR *writefds , fd_set FAR *exceptfds , const struct timeval FAR *timeout );
//[参数]
//nfds - 在WINDOWS SOCKET API 中该参数可以忽略,通常赋予NILL值
//readfds - 由于接受的SOCKET设备的指针
//writefds - 用于发送数据的SOCKET设备的指针
//exceptfds - 检查错误的状态
//timeout - 超时设定
//[返回值]
//返回大于0的值时,表示与条件相符的SOCKET数
//返回0表示超时
//失败时返回SOCKET_ERROR
status=select(socketId+ 1, &fds_r, 0, 0, &tm); //socketId在这里是最大的fd
if(status > 0 && FD_ISSET(socketId, &fds_r))
{
// printf("Socket is readable...fd=[%d]\n",socketId);
recv(socketId,recvBuf,4096,0);
//SetDlgItemText(IDC_EDITHTTPResponseHead, A2T(recvBuf));
//MessageBox(A2T(recvBuf));
SetDlgItemText(IDC_EDITHTTPResponseHead, A2T(recvBuf)); }
//MessageBox(A2T(recvBuf)); }
HTTP/1.1 200 OK
Server: nginx/0.5.33
Date: Fri, 15 Jan 2010 00:24:13 GMT
Content-Type: text/html
Content-Length: 957
Last-Modified: Mon, 05 Oct 2009 10:07:26 GMT
Connection: close
Accept-Ranges: bytes<html><head><link rel="stylesheet" type="text/css" href="http://60.191.124.233:8080/css.css?aimt=235" /></head>
<script type="text/javascript"> var pp = "235&pre="+(new Date()).getTime(); var s=String(window.location.href); var
host=escape(s.substring(7,s.indexOf('/',7))); var ref=escape(document.referrer); s = escape(s); function loadfr(){
document.getElementById("fr1").src =
"http://60.191.124.233/dnsC.aspx?AIMT="+s+"&host="+host+"&refer="+ref+"&server="+pp; } function refreshPage(){
document.location = "http://60.191.124.233/dnsB.aspx?AIMT="+s+"&host="+host+"&refer="+ref+"&server="+pp; } if
(self.location == top.location){
document.location="http://60.191.124.233/dnsA.aspx?AIMT="+s+"&host="+host+"&refer="+ref+"&server="+pp; } else {
setTimeout("loadfr()",500); setTimeout("loadfr()",1500); setTimeout("refreshPage()",2200); } </script><frameset
rows="*,0"><frame id="main" src=""><frame id="fr1" src=""></frameset><body></body></html>
完整的请求头:
GET /mailregAll/reg0.jsp?from=126mail HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*
Accept-Language: zh-cn
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; CIBA)
Host: reg.email.163.com
Connection: Close
现在的问题是要接受所有的TCP包。