从网页中抓取下载文件连接地址,网页HTML格式如下:
<HTML>
<!-- Lotus-Domino (Release 5.0.6a - January 17, 2001 on Windows NT/Intel) -->
<HEAD>
<TITLE>&#187; 2005年度会议安排</TITLE></HEAD>
<BODY TEXT="000000" BGCOLOR="ffffff">
....
<BR>
<A HREF="/mail/cdd.nsf/0559a9510a955b0d482566a3000d4507/c2997b5d8d712843482570d600374b6f/$FILE/_068o30d8jpvn17h5v2f8ui4uaqk9rphgjnen17cfd5ko3ac9i_.doc"><IMG SRC="/mail/lsq.nsf/0559a9510a955b0d482566a3000d4507/c2997b5d8d712843482570d600374b6f/Body/0.39A2?OpenElement&FieldElemFormat=gif" WIDTH=205 HEIGHT=48 BORDER=0></A>BR>
</UL><BR>
<BR>
</BODY>
</HTML>如何获取内容,文件格式都是doc或xls:
"/mail/cdd.nsf/0559a9510a955b0d482566a3000d4507/c2997b5d8d712843482570d600374b6f/$FILE/_068o30d8jpvn17h5v2f8ui4uaqk9rphgjnen17cfd5ko3ac9i_.doc"