自己写的爬虫，附代码。请教问题

import java.util.Hashtable;
import java.util.Vector;
public class URLStore {
    private Vector urls;
    public URLStore ()
    {
       urls=new Vector ();
       urls.add("http://www.google.com/");
    }
    public boolean isEmpty(){
        return urls.size() == 0;
    }
    public synchronized String popUrl()
    {
       while (isEmpty()==true)
       {
            try {
                System.out.println("The Vector is empty!");
                wait();
            } catch (InterruptedException ex) { }

       }
            String b =(String) urls.get(0);
            urls.remove(b);
            return b;

    }
    public synchronized void pushUrl(String url)
    {
        urls.add(url);
        notify();
    }}import java.util.Vector;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.htmlparser.parserapplications.StringExtractor;
import org.htmlparser.util.ParserException;public class GetContent extends Thread{
    private URLStore  st;
    public GetContent(URLStore x)
    {
        st=x;    }
    public void run ()
    {
        boolean read=true;
        while(read==true){
        StringExtractor se =new StringExtractor(st.popUrl());
        String linkedText =null,prueText = null;
           try {
                linkedText = se.extractStrings(true);
                prueText =se.extractStrings(false);
            } catch (ParserException ex) {
                ex.printStackTrace();
            }
            System.out.println(prueText);
            System.out.println("-----------------");             String regex = "[a-zA-z]+://[^\\s]*>";
        Pattern p = Pattern.compile(regex);
        Matcher m = p.matcher(linkedText);
        Vector links=new Vector();        while (m.find()) {
            for (int i = 0; i <= m.groupCount(); i++) {
                String tmp = m.group();
                tmp = tmp.replaceAll(">", "");
                System.out.println(tmp);
                st.pushUrl(tmp);
            }        }
    }

  }
} 我的问题是：
这个程序基本可以爬行但是没有做exception 还有filtering，我现在用的是vector 作为url 容器，但是觉得如果要是实现网页过滤（把已经爬过的网页过滤掉）觉得用vector 有点吃力，在网上查了下资料觉得hashtable 不错。。但是基于个人能力问题（再读学生），老是code 不到。。请大侠们帮我看下如果想要把爬过的网站过滤除去用hashtable 怎么实现啊。。小弟不胜感激谢谢

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

看下JAVA API JAVA.UTIL.MAP也许能帮到你
爬虫……偶还是学习吧。Google一下或许能找到一些。
   就是说用map 替换掉vector么？
看看这个吧.
http://www.open-open.com/68.htm

自己写的爬虫，附代码。 请教问题

解决方案 »

自己写的爬虫，附代码。请教问题