lucene-java-user mailing list archives

Site index · List index
Message view
Top
From "Runde, Kevin" <kru...@NameProtect.com>
Subject RE: Search performance with one index vs. many indexes
Date Mon, 28 Feb 2005 13:33:52 GMT
```Follow Up to the article from Friday

-----Original Message-----
From: Morus Walter [mailto:morus.walter@tanto.de]
Sent: Monday, February 28, 2005 1:30 AM
To: Lucene Users List
Subject: Re: Search performance with one index vs. many indexes

Jochen Franke writes:
> Topic: Search performance with large numbers of indexes vs. one large
index
>
>
> My questions are:
>
> - Is the size of the "wordlist" the problem?
> - Would we be a lot faster, when we have a smaller number
> of files per index?

sure.
Look:
Index lookup of a word is O(ln(n)) where n is the number of words.
Index lookup of a word in k indexes having m words is O( k ln(m) )
In the best case all word lists are distict (purely theoretical),
that is n = k*m or m = n/k
For n = 15 Mio, k = 800
ln(n) = 16.5
k*ln(n/k) = 7871
In a realistic case, m is much bigger since word lists won't be
distinct.
But it's the linear factor k that bites you.
In the worst case (all words in all indices) you have
k*ln(n) = 13218.8

HTH
Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org