lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Quaroni <dquar...@OPENRATINGS.com>
Subject Fastest batch indexing with 1.3-rc1
Date Wed, 20 Aug 2003 18:03:43 GMT
Hey there.  What's the fastest way to do a batch index with lucene 1.3-rc1
on a dual or quad-processor box?  The files I'm indexing are very easy to
split divide among multiple threads.

Here's what I've done at this point:

Each thread has its own IndexWriter writing to its own RAMDirectory.  Every
<number> of documents, I mergeIndexes the thread's index to the main disk
index.

The thread writers have a mergeFactor of 50.
The disk indexWriter has a mergeFactor of 30.
I call optimize only on the main disk index, and only once at the very end.

Just doing this has shown great improvements for me, but I want to squeeze
out every bit of performance I can.  What's the fastest way to mergeIndexes?
Should I use a low mergeFactor when working with RAMDirectorys?  Should I
optimize the thread index before I merge it to the main one?

Thanks!

Mime
View raw message