lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Glen Newton" <>
Subject Re: Indexing Scalability, Multiwriter?
Date Sat, 11 Oct 2008 02:17:45 GMT
IndexWriter is thread-safe and has been for a while
so you don't have to worry about that.

As reported in my blog in April
but perhaps not explicitly enough: in indexing 6.4M full-text articles
generating an index of 83GB, I used a pipeline architecture consisting
of a several ThreadPoolExecutors:

1 - A main program that gets the article metadata (author, title,
abstract, etc) from JDBC + creates Article object + adds it to #2

2 - A pool with a queue of 100 Article objects; the Runnable reads the
full-text for the article from the file system. The files are GZiped,
so this is also done. Full-text is added to Article object & Article
object added to queue #3. 4 threads (as more causes major performance
degradation through IO waits).

3 - A pool with a queue of 1000 Article objects; the Runnable creates
a Lucene Document from the Article object fields and adds the Document
to queue #4. 64 threads are running in this pool.

4 - A pool with a queue of 100 Documents; the Runnable adds the
Document to one of
8 IndexWriters, sent roundrobin. 16 threads running in this queue.

When all documents are processed, all 8 IndexWriters are merged into a
single index and optimized. From the blog entry: 20.5 hours to process
6.4M articles, 143GB text. See the entry for software/VM/hardware

I tried all combinations of threads/pool size/#IndexWriters and the
above was the 'sweet point' for my particular index and hardware.

I hope this is helpful. If you have any questions, please let me know.



2008/10/10 Darren Govoni <>:
> Hi gang,
>  Wondering how folks have address scaled up indexing. I saw old threads
> about using clustered webapp with JNDI singleton index writer due to the
> Lucene single writer limitation. Is this limitation lifted in 3 maybe?
> Is there a best strategy for parallel writing to an index by many
> threads?
> thanks for any tips! You guys rock.
> Darren
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message