lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergiu Gordea <>
Subject Re: addIndexes() Question
Date Thu, 23 Dec 2004 08:30:04 GMT
I think you should change a little bit your plans, and to think that 
your goal is to
create a fast search engine not a fast indexing engine.
When you plan to index a lot of documents then it is possible to creata 
a lot of segments (if you don't optimize the index)
and the serch will be very slow comparing with the search on an 
optimized index.
The problem is that the optimization of big indexes is a time consuming 
operation, and also

addIndexes(Directory[] dirs) I think is also a time consuming operation.

 Therefore I suggest to think how can you design the indices to have a fast search, and then

you should design an offline indexing process. 

 That is my suggestion ... maybe it doesn't fit your requirements, maybe it does ...
  All the best,


Ryan Aslett wrote:

>Hi there, Im about to embark on a Lucene project of massive scale
>(between 500 million and 2 billion documents).  I am currently working
>on parallellizing the construction of the Index(es). 
>Rough summary of my plan:
>I have many, many physical machines, each with multiple processors that
>I wish to dedicate to the construction of a single index. 
>I plan on having each machine gather its documents from a central
>sychronized source (network, JMS, whatever). 
>Within each machine I will have multiple threads each responsible for
>construcing an index slice.
>When all machines and all threads are finished, I should have a slew of
>index slices that I want to combine together to create one index.
>My question is this:  Will it be more efficient to call
>addIndexes(Directory[] dirs) on all the slices all at once? 
>Or might it be better to continually merge small indexes into a larger
>index, i.e. once an index slice reaches a particular size, merge it into
>the main index and start building a new slice...
>Any help would be appreciated.. 
>Ryan Aslett
>To unsubscribe, e-mail:
>For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message