正则表达式如何取得url的重要信息 - 调试易

正则表达式如何取得url的重要信息

我如何利用正则表达式来取得URL的重要信息，
比如http://agsfeke.com/haha以上这个网址，我如何能取到agsfekehttp://www.baidu.com这个网址我只取到 baidu，
请正则高手帮忙指点一下，最好能有实例
先谢谢各位了正则表达式URL正则

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

这个要看你怎么定义重要信息了。。比如：
http://hk.finance.yahoo.com，你要的是啥
http://del.icio.us，你要的是啥
http://agsfeke.com/haha
http://www.baidu.com
其实也只有这两种情况吧？
带www的或者不带的？
regex="http://(.*?).com.*?|http://www.(.*?).com.*?";
while(m.find()){
String str=m.group(1);
}
.用不用转义一下我忘了。你试试吧。
public static void main(String[] args) {
        String inputString = "http://www.baidu.com";
        Pattern pattern = Pattern.compile("(//www.|//)(.*)(.com|.net|.org)");
        Matcher matcher = pattern.matcher(inputString);
        while (matcher.find()) {
            System.out.println(matcher.group(2));
        }
    }
根据你自己的情况看着改
大概这样，
String url = "http://www.wwwagsfekecom.he.com/haha";
Pattern p = Pattern.compile("(.*?://)?(www\\.|bbs\\.)?(.*?)\\.(com|cn).*");
Matcher m = p.matcher(url);
while (m.find()) {
System.out.println(m.group(3));
}
其他的像new.或是.org等等，自己添加就是了
怎么跟你说呢
我要实现的功能是通过一个网站的主页一层一层的追迹进去，为了不追迹到与这个网站无关的网站，我限定了如果domain数据不一样的话就不进行追迹，但是现在的问题是
有一个网站地址
http://fujifilm.jp/index.html
他的link中有一个连接地址是
http://www.fujifilm.co.jp/corporate/index.html
domian地址变了，其实还是与其相关但是追不进去了