在http://www.tommstudio.com/ViewArticles.aspx?ID=921上,有个简单的示例,演示如何获取网页源代码。
============================================
获取网页源代码的最简单办法,就是利用 WinInet 单元中的函数:
uses WinInet;
function GetWebPage(const Url: string):string;
var
Session,
HttpFile:HINTERNET;
szSizeBuffer:Pointer;
dwLengthSizeBuffer:DWord;
dwReserved:DWord;
dwFileSize:DWord;
dwBytesRead:DWord;
Contents:PChar;
begin
Session:=InternetOpen('',0,niL,niL,0);
HttpFile:=InternetOpenUrl(Session,PChar(Url),niL,0,0,0);
dwLengthSizeBuffer:=1024;
HttpQueryInfo(HttpFile,5,szSizeBuffer,dwLengthSizeBuffer,dwReserved);
GetMem(Contents,dwFileSize);
InternetReadFile(HttpFile,Contents,dwFileSize,dwBytesRead);
InternetCloseHandle(HttpFile);
InternetCloseHandle(Session);
Result:=StrPas(Contents);
FreeMem(Contents);
end;使用时,直接把收到的源代码显示出来:
Memo1.Text := GetWebPage('http://www.tommstudio.com/');
============================================这段代码分析之后,发现HttpQueryInfo好像没有什么用处。而且dwFileSize也没有初始化。
把这段代码改了一下:function GetWebPage(const Url: string):string;
var
Session, // InternetOpen,InternetOpenUrl
HttpFile:HINTERNET; // InternetOpenUrl,HttpQueryInfo,InternetReadFile
dwFileSize:DWord; // dwFileSize,InternetReadFile
dwBytesRead:DWord; // InternetReadFile
Contents:PChar; // GetMem,InternetReadFile
funstatus : boolean;
i : integer;
t: string;
begin
dwFileSize := 10000;
GetMem(Contents,dwFileSize+1);
(contents + dwFileSize)^ :=#00; // 确保strpas转换的时候,有空字符结束。
Session:=InternetOpen('',0,niL,niL,0);
HttpFile:=InternetOpenUrl(Session,PChar(Url),niL,0,0,0);
// HttpQueryInfo(HttpFile,5,szSizeBuffer,dwLengthSizeBuffer,dwReserved);
dwBytesRead := 1; funstatus := true; result := '';
repeat
funstatus:=InternetReadFile(HttpFile,Contents,dwFileSize,dwBytesRead);
if 0<>dwBytesRead then begin
t := strpas(Contents);
if dwbytesread < dwfilesize then setlength(t,dwBytesRead);
result := result + t;
end;
until dwBytesRead=0;
if not funstatus or (dwBytesRead>dwFileSize) then result := 'Error! Maybe can''t get all data out!' + #13 + result;
InternetCloseHandle(HttpFile);
InternetCloseHandle(Session);
FreeMem(Contents);
end;在一些论坛,网页试了一下工作正常,但是到了下面的页面:http://p078.ezboard.com/bcellofun,会出错。这个页面ie可以正常打开的。错误是:
Server Maintenance Alert
p078.ezboard.com
The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request <em><a href="/bcellofun">GET /bcellofun</a></em>.<p> Reason: <strong>Error reading from remote server</strong></p>
We're sorry, there appears to have been an error processing your request. Please Go Back to your previous page. You may resubmit your request.
If you see this server maintence alert for more than four minutes, please go to Server Status forum. Monday, 27-Feb-2006 18:40:39 PST
Apache/2.0.52 (Unix) 相应的那段错误代码整理后为:
The proxy server could not handle the request GET /bcellofun.
Reason: Error reading from remote server可以看到的是,这个论坛是UNIX+APACHE系统,而且好像和proxy有关?
查阅了internetopen和internetopenurl的msdn以及wininet.pas.
dwAccessType:0代表INTERNET_OPEN_TYPE_PRECONFIG
dwFlags:0代表INTERNET_INVALID_PORT_NUMBER
不知道这些设置是否是正确的?谢谢诸位。
That's all.
============================================
获取网页源代码的最简单办法,就是利用 WinInet 单元中的函数:
uses WinInet;
function GetWebPage(const Url: string):string;
var
Session,
HttpFile:HINTERNET;
szSizeBuffer:Pointer;
dwLengthSizeBuffer:DWord;
dwReserved:DWord;
dwFileSize:DWord;
dwBytesRead:DWord;
Contents:PChar;
begin
Session:=InternetOpen('',0,niL,niL,0);
HttpFile:=InternetOpenUrl(Session,PChar(Url),niL,0,0,0);
dwLengthSizeBuffer:=1024;
HttpQueryInfo(HttpFile,5,szSizeBuffer,dwLengthSizeBuffer,dwReserved);
GetMem(Contents,dwFileSize);
InternetReadFile(HttpFile,Contents,dwFileSize,dwBytesRead);
InternetCloseHandle(HttpFile);
InternetCloseHandle(Session);
Result:=StrPas(Contents);
FreeMem(Contents);
end;使用时,直接把收到的源代码显示出来:
Memo1.Text := GetWebPage('http://www.tommstudio.com/');
============================================这段代码分析之后,发现HttpQueryInfo好像没有什么用处。而且dwFileSize也没有初始化。
把这段代码改了一下:function GetWebPage(const Url: string):string;
var
Session, // InternetOpen,InternetOpenUrl
HttpFile:HINTERNET; // InternetOpenUrl,HttpQueryInfo,InternetReadFile
dwFileSize:DWord; // dwFileSize,InternetReadFile
dwBytesRead:DWord; // InternetReadFile
Contents:PChar; // GetMem,InternetReadFile
funstatus : boolean;
i : integer;
t: string;
begin
dwFileSize := 10000;
GetMem(Contents,dwFileSize+1);
(contents + dwFileSize)^ :=#00; // 确保strpas转换的时候,有空字符结束。
Session:=InternetOpen('',0,niL,niL,0);
HttpFile:=InternetOpenUrl(Session,PChar(Url),niL,0,0,0);
// HttpQueryInfo(HttpFile,5,szSizeBuffer,dwLengthSizeBuffer,dwReserved);
dwBytesRead := 1; funstatus := true; result := '';
repeat
funstatus:=InternetReadFile(HttpFile,Contents,dwFileSize,dwBytesRead);
if 0<>dwBytesRead then begin
t := strpas(Contents);
if dwbytesread < dwfilesize then setlength(t,dwBytesRead);
result := result + t;
end;
until dwBytesRead=0;
if not funstatus or (dwBytesRead>dwFileSize) then result := 'Error! Maybe can''t get all data out!' + #13 + result;
InternetCloseHandle(HttpFile);
InternetCloseHandle(Session);
FreeMem(Contents);
end;在一些论坛,网页试了一下工作正常,但是到了下面的页面:http://p078.ezboard.com/bcellofun,会出错。这个页面ie可以正常打开的。错误是:
Server Maintenance Alert
p078.ezboard.com
The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request <em><a href="/bcellofun">GET /bcellofun</a></em>.<p> Reason: <strong>Error reading from remote server</strong></p>
We're sorry, there appears to have been an error processing your request. Please Go Back to your previous page. You may resubmit your request.
If you see this server maintence alert for more than four minutes, please go to Server Status forum. Monday, 27-Feb-2006 18:40:39 PST
Apache/2.0.52 (Unix) 相应的那段错误代码整理后为:
The proxy server could not handle the request GET /bcellofun.
Reason: Error reading from remote server可以看到的是,这个论坛是UNIX+APACHE系统,而且好像和proxy有关?
查阅了internetopen和internetopenurl的msdn以及wininet.pas.
dwAccessType:0代表INTERNET_OPEN_TYPE_PRECONFIG
dwFlags:0代表INTERNET_INVALID_PORT_NUMBER
不知道这些设置是否是正确的?谢谢诸位。
That's all.
解决方案 »
- 如何快速定位到DBGrid的某一行!!!急...
- Delphi如何统一界面(Win98/Win200/WinXP)
- WebService新手上路
- 100分请教 delphi适合开发什么 有什么优缺点 如何打包 公司让我调查非常急 在线等诸位大哥哥帮帮忙
- edit问题
- 求一sql语句
- 写入数据库的问题
- Ctrl+Up,Ctrl+Down,Ctrl+Left,Ctrl+Right已被系统作为切换焦点使用,我如何....
- 请问有没有pas单元分割工具
- 哪位C/S+scktsrvr高人救救我啊!!!!!!!!!!!!!!!!!!!!
- 怎样用Idsmtp,Idmessage来进行邮件群发???
- 如何创建一个管理类的链表?
有一个网页清单,这个工具可以按照这个清单把对应的网页都取出来,并保存到指定的文件名。
那些离线的浏览工具,是自动分析网站,把全部东西都下下来,但是我并不需要这样。只要一些指定的页面就可以了,有吗?flashget可以实现取网页,但是文件名不能批量指定。
That's all.
http://bbs.2ccc.com/attachments/2006/bb3903809_20063314651.rar
That's all.