lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Lamprecht <>
Subject Re: Indexing in multi-threaded environment
Date Wed, 04 May 2005 04:28:43 GMT
Hi Sodel,

You could use a single queue, where one thread pulls things off the
queue and any number of threads put things on the queue.  You can
index say 1000 documents each to RAMDirectories in multiple threads,
then enqueue the RAMDirectories.  When the queue reaches a certain
size, the single thread can empty the queue and call
IndexWriter.addIndexes().  A blocking queue is best, there's probably
one in doug lea's util.concurrent, or in Java 1.5 there is

I've done exactly what you describe, using N threads where N is the
number of processors on the machine, plus one more thread that writes
to the file system index (since that is I/O-bound anyway).  Since most
of the CPU time is tokenizing/stemming/etc, the method works well. 
The main drawback is IndexWriter.addIndexes(Directory[]) always calls
optimize, which takes a lot of time as the index grows.

On 5/3/05, Sodel Vazquez-Reyes
<> wrote:
> Hi,
> I am starting my application in multi-threaded environment,
> could somebody show me any examples with serialize calls to the
> IndexWriter.addDocument(Document)?
> because my idea is to use RAMDirectory based in parallel, one in each
> thread, and merges them into a single index on the disk using
> IndexWriter.addIndexes(Directory[]) method, It is working with a single
> process but I have problems with my threads implementation.
> Or any ideas about this.
> Best regards.
> Sodel.
> --
> Sodel Vazquez-Reyes
> PhD Student
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message