lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: addIndexes() Question
Date Thu, 23 Dec 2004 02:19:11 GMT
I _think_ you'd be better off doing it all at once, but I wouldn't
trust myself on this and would instead construct a small 3-index set
and test, looking at a) maximal disk usage, b) time, and c) RAM usage.


--- Ryan Aslett <> wrote:

> Hi there, Im about to embark on a Lucene project of massive scale
> (between 500 million and 2 billion documents).  I am currently
> working
> on parallellizing the construction of the Index(es). 
> Rough summary of my plan:
> I have many, many physical machines, each with multiple processors
> that
> I wish to dedicate to the construction of a single index. 
> I plan on having each machine gather its documents from a central
> sychronized source (network, JMS, whatever). 
> Within each machine I will have multiple threads each responsible for
> construcing an index slice.
> When all machines and all threads are finished, I should have a slew
> of
> index slices that I want to combine together to create one index.
> My question is this:  Will it be more efficient to call
> addIndexes(Directory[] dirs) on all the slices all at once? 
> Or might it be better to continually merge small indexes into a
> larger
> index, i.e. once an index slice reaches a particular size, merge it
> into
> the main index and start building a new slice...
> Any help would be appreciated.. 
> Ryan Aslett
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message