java解析txt文件

小弟手里有这样一个文本文件：
1 02
http://news.xinhuanet.com/edu/2006-03/08/content_4276043.htm<html>
网页代码
</html>2 08
http://news.xinhuanet.com/edu/2006-03/08/content_4276043.htm<html>
网页代码
</html>
.
.
.
.
一共有1000多个这样的段落我想把每个段落<html></html>以及中间的内容去掉，其他全部保留，想了很长时间也没想出来该怎么弄，请各位高手帮帮忙，小弟先谢了

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

replaceAll("<html></html>","");仅供参考，关注
把文件的内容先读出，替换完成之后，然后写txt
如果<html></html>匹配
str=str.replaceAll("<html>.*</html>","");
有个开源得html解析工具可以做到. htmlparser
String regEx="</?[^>]+>";//"<textarea>(\\s|.)*</textarea>";     Matcher m= Pattern.compile(regEx).matcher(str);        while(m.find())
System.out.println(m.group());
        System.out.println(m.replaceAll(""));
       }