紧急求救！搜索引擎Lucene2.0创建海量（600万条数据）索引文件！

我的600万数据分45个表在数据库里存储，所以我在用lucene2.0创建索引文件时也产生45个文件分开做查询。现在用程序要把海量数据一个个表（一个表最多数据达到20000）生成lucene引擎的索引文件。
我开始的想法是把一个表的数据一次性全部处理，很显然出现内存溢出。
现在改为一条条记录读取，读取一条记录就生成一个暂时的索引文件a，b是要生成的一个表的总索引文件，然后把a加到b中。实现了，却发现运行起来很费劲，并且越运行速度越慢。
这是我的测试时间，每100条数据记录一次时间：
刚开始:
10,20,30,40,50,60,70,80,90,100,
2007-11-06 22:37:09
110,120,130,140,150,160,170,180,190,200,
2007-11-06 22:37:12
210,220,230,240,250,260,270,280,290,300,
2007-11-06 22:37:16
310,320,330,340,350,360,370,380,390,400,
2007-11-06 22:37:20
两千条数据之后：
1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,
2007-11-06 22:39:07
2010,2020,2030,2040,2050,2060,2070,2080,2090,2100,
2007-11-06 22:39:16
2110,2120,2130,2140,2150,2160,2170,2180,2190,2200,
2007-11-06 22:39:26
2210,2220,2230,2240,2250,2260,2270,2280,2290,2300,
2007-11-06 22:39:36
2310,2320,2330,2340,2350,2360,2370,2380,2390,2400,
2007-11-06 22:39:46
我的程序主要是这样写的：public class ResultIndexDao { /**
* @param args
*/
public static void main(String arg[]) {
ConnectionOracle ts=null;

try{

FSDirectory directory = FSDirectory.getDirectory("D:\\eclipse 3.2\\JavaScape\\HXAIC\\Index\\Index");
IndexWriter writer = new IndexWriter(directory, getAnalyzer(), true);

int i=0;int j=0;int m=0;
while(rs.next())
{
HashMap hap=new HashMap();
hap.put("fBZ", rs.getString("fBZ"));
FSDirectory radmdy=writerIndex(hap);
hap=null;
}
writer.optimize();
writer.close();

ts.transactionCommit(false);

}catch(Exception e){e.printStackTrace();
ts.transactionRollback(false);}

}
public static FSDirectory writerIndex(HashMap hmp)
{
TMCodeDao tmdao=new TMCodeDao();
FSDirectory ramDir = null;
try{
ramDir = FSDirectory.getDirectory("D:\\eclipse 3.2\\JavaScape\\HXAIC\\Index\\Index");
IndexWriter writerRam = new IndexWriter(ramDir,getAnalyzer(),true);
Document doc=new Document();

if(hmp.get("fbz")!=null)
doc.add(new Field("fbz", (String)hmp.get("fbz"),Field.Store.YES,Field.Index.NO));
else
doc.add(new Field("fbz", "",Field.Store.YES,Field.Index.NO));

writerRam.addDocument(doc);
writerRam.optimize();
writerRam.close();

}catch(Exception e){e.printStackTrace();}

return ramDir;
}

public static Analyzer getAnalyzer()
{
return new ChineseAnalyzer();

}
}我该如何去优化程序呢？让处理速度提高，或者有没有更好的方法生成lucene索引文件我接触lucene不深，希望知道的朋友能多多指教，不胜感激！

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

lucene的优化有很多可以通过分词（字段是否被查询），索引（字段是否被查询）等等来提高速度
关键字：lucene.net 搜索排序内存猛涨内存溢出 IndexSearcher TopDocs weight/** *//** Creates a searcher searching the index in the named directory. */
public IndexSearcher(String path) throws IOException ...{
    this(IndexReader.open(path), true);
  }  /** *//** Creates a searcher searching the index in the provided directory. */
  public IndexSearcher(Directory directory) throws IOException ...{
    this(IndexReader.open(directory), true);
  }  /** *//** Creates a searcher searching the provided index. */
  public IndexSearcher(IndexReader r) ...{
    this(r, false);
  }

  private IndexSearcher(IndexReader r, boolean closeReader) ...{
    reader = r;
    this.closeReader = closeReader;
  }在lucene应用中也许很多人都遇到这种情况。当索引太大（大于10G），搜索时用前两种构造方法声明IndexSearcher对象，这样每构造一个IndexSearcher对象，都要声明一个索引对象（实际上是一个索引的多次连接），而每个索引对象都要占用一定量的系统资源（主要是内存）。当大量用户访问系统时，就会看到系统内存直线增长，致使产生“java heap space”内存耗尽或内存溢出（.net）。这个问题可以通过以下方法解决:终极解决方法：
联系方式： [email protected] , [email protected]