lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Aslett" <>
Subject addIndexes() Question
Date Wed, 22 Dec 2004 23:45:15 GMT
Hi there, Im about to embark on a Lucene project of massive scale
(between 500 million and 2 billion documents).  I am currently working
on parallellizing the construction of the Index(es). 

Rough summary of my plan:
I have many, many physical machines, each with multiple processors that
I wish to dedicate to the construction of a single index. 
I plan on having each machine gather its documents from a central
sychronized source (network, JMS, whatever). 
Within each machine I will have multiple threads each responsible for
construcing an index slice.

When all machines and all threads are finished, I should have a slew of
index slices that I want to combine together to create one index.

My question is this:  Will it be more efficient to call
addIndexes(Directory[] dirs) on all the slices all at once? 

Or might it be better to continually merge small indexes into a larger
index, i.e. once an index slice reaches a particular size, merge it into
the main index and start building a new slice...

Any help would be appreciated.. 

Ryan Aslett

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message