请教大家一个信息采集的问题

我想采集的信息是网页上一些有用信息，网页已经被我用代码当下，转换成文本格式。比如：
<td height="25" colspan="2">company<a href="company.asp?id=2236" target="_blank">福州欣烨电子有限公司< ></a></td>
</tr>
<tr> </tr>
<tr>
<td colspan="2">address=：福州金山大道580号</td>
</tr>
<tr>
<td width="60%">telephoto：0591286756</td>
<td width="40%" align="right">
提取：福州欣烨电子有限公司福州金山大道580号福州金山大道580号 0591286756
我提取是思路是：用indexof（）方法我的方法是一行一行读，碰到关键字，k**words：address，telephoto。获取以后的信息，以下是我写的获取address，的代码：import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStreamReader;
public class Getfilethin {public static void main(String[] args) {InputStreamReader reader=null;
try{
File BR =new File("c:/text4.txt");
reader=new InputStreamReader(new FileInputStream(BR));
} catch(Exception e){
System.out.print("\n抓到异常!");
e.printStackTrace();
}
BufferedReader br2=new BufferedReader(reader);
try{
boolean b = false;
String Line = null;
boolean isBeginFLAG = true;
while(Line!=null || isBeginFLAG){
isBeginFLAG = false;
Line = br2.readLine();
String str1 = Line;
String k**Words = "address=";
int index1 = str1.indexOf(k**Words);
if(index1 == -1)
b = false;
else b=true;
// System.out.println(b+"");
if(b){
String str2= str1.substring(index1 + k**Words.length());
System.out.println(str2);
b = false;
break;
}
}
}catch(Exception e)
{
System.out.print("\n抓到异常!");
e.printStackTrace();
}try{
br2.close();//关闭BufferedReader对象
}catch (Exception e)
{
System.out.print("\n抓到异常!");
e.printStackTrace();}
}
}但我写完后发现如果原文件里有多个关键字，提取麻烦。加上也不知道你也不知道要截取多长的信息。所以追求道友们有什么好方法。帮帮小弟。

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

用正则式。(起始字符)(?<neirong>.*)(结束字符)
内容里就是你要的信息。这个是单字段，多个字段你多个搜索就是了。
1 正则才是正途
2 如果你想自己实现，给你个建议
int index1 = str1.indexOf(k**Words); 看看这个代码，你是查找第一个出现的为主，你应该记录你已经查找过的位置，比如int position = 0;每次index后都把 postion 增加到当前位置。然后使用int index1 = str1.indexOf(k**Words,position);  从当前位置查找下一个