lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Garrett Heaver" <garrett.hea...@researchandmarkets.com>
Subject RE: addIndexes() Question
Date Thu, 23 Dec 2004 09:52:28 GMT
Hi Ryan

I too am using addIndexes(), all be it for slightly different reasons.
However, I would recommend only calling addIndexes() for fairly sizable
slices and all slices at once. The reason I'm suggesting it is that optimize
is called automagically both before and after the addIndexes method so if
you are only adding very small slices you're optimizing the main index more
times than necessary

There is of course the obvious trade of "spider --> live index" time being
shorter in one method that the other.

The other thing that I found on my machines (I'm spidering on one machine
and storing the live index on another) is that network performance isn't so
hot when you are continually opening and closing connections on other
machines to do the merge (under NT this is, Linux may be much better :) so
it made more sense for me to create larger slices and only open the
connection to the live index machine when necessary

Hope this helps

Garrett

-----Original Message-----
From: Ryan Aslett [mailto:Ryan.Aslett@Qsent.com] 
Sent: 22 December 2004 23:45
To: Lucene Users List
Subject: addIndexes() Question

 
Hi there, Im about to embark on a Lucene project of massive scale
(between 500 million and 2 billion documents).  I am currently working
on parallellizing the construction of the Index(es). 

Rough summary of my plan:
I have many, many physical machines, each with multiple processors that
I wish to dedicate to the construction of a single index. 
I plan on having each machine gather its documents from a central
sychronized source (network, JMS, whatever). 
Within each machine I will have multiple threads each responsible for
construcing an index slice.

When all machines and all threads are finished, I should have a slew of
index slices that I want to combine together to create one index.

My question is this:  Will it be more efficient to call
addIndexes(Directory[] dirs) on all the slices all at once? 

Or might it be better to continually merge small indexes into a larger
index, i.e. once an index slice reaches a particular size, merge it into
the main index and start building a new slice...

Any help would be appreciated.. 

Ryan Aslett


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message