lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <...@teamware.com>
Subject Multi-threaded IndexWriter
Date Tue, 03 Oct 2006 05:58:25 GMT
Hi,

I have a multi-threaded indexing application that indexes documents into a set 
of Lucene index databases (I have millions of documents to index, hence the 
split DB) .  When a thread gets an index request, it determines the index DB to 
index the data in.  It grabs the IndexWriter for that database.

My question is: If I have several threads that want to index some data for the 
same DB concurrently and also have threads that will be wanting to delete 
documents and searchers too.  Does anyone know the benefits and drawbacks of the 
following approaches with respect to the performance characteristics of the 
Lucene internals

a) Serialisation of writes i.e. multiple IndexWriter.close().  Each thread 
blocks waiting for the writer and does

new IndexWriter()
addDocuments()
close IndexWriter

for each thread or

b) Parallelisation of writes with a single IndexWriter.close().  Allow all 
threads to share the same IndexWriter instance.  LIA says that IndexWriter is 
thread-safe between several threads.  So, the first thread requesting the writer 
just creates a new instance, all subsequent threads just add documents to the 
same instance with the last user closing the writer, e.g.

First thread - new IndexWriter()
2..n threads - inc use_count +┬┤get existing IndexWriter
all threads - addDocuments()
n..2 threads - dec use_count
Last thread - close IndexWriter

The middle 3 steps will of course happen in random order, not as defined above.

Thanks
Antony


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message