lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Morus Walter <morus.wal...@tanto.de>
Subject Re: Search performance with one index vs. many indexes
Date Mon, 28 Feb 2005 07:30:21 GMT
Jochen Franke writes:
> Topic: Search performance with large numbers of indexes vs. one large index
> 
> 
> My questions are:
> 
> - Is the size of the "wordlist" the problem?
> - Would we be a lot faster, when we have a smaller number
> of files per index?

sure. 
Look:
Index lookup of a word is O(ln(n)) where n is the number of words.
Index lookup of a word in k indexes having m words is O( k ln(m) )
In the best case all word lists are distict (purely theoretical), 
that is n = k*m or m = n/k
For n = 15 Mio, k = 800
ln(n) = 16.5
k*ln(n/k) = 7871
In a realistic case, m is much bigger since word lists won't be distinct.
But it's the linear factor k that bites you.
In the worst case (all words in all indices) you have
k*ln(n) = 13218.8

HTH
	Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message