lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dalton, Jeffery" <jdal...@globalspec.com>
Subject Lucene Merge Algorithm, max number of segments
Date Mon, 06 Mar 2006 21:57:59 GMT
I am just going to wax philosophical for a minute.  I am trying to
understand lucene's merging algorithm in depth.  

Let's say I create an index of 25M web pages on a single machine.  While
creating this index I am doing both search and indexing / re-indexing at
the same time, a bit like Technorati
(http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg12709.htm
l), and so I assume a merge factor of 2, as Doug references in the
Technorati post.  According to the algorithm described in Doug's pisa
talk: http://lucene.sourceforge.net/talks/pisa/ the average number of
indices when I get near 25M docs is: 
2 * log2(25,000,000) / 2 = 24.6 indices.

According to the FAQ, http://wiki.apache.org/jakarta-lucene/LuceneFAQ,
there is no way to set a hard limit on the number of indices.  It seems
to me that it might be nice to be able to impose a hard limit on the
number of segments, in this case, perhaps 5 or 10 indices.  

Does anyone have any experience with this?  Has anyone tried imposing a
hard limit?  

Thanks,

- Jeff



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message