lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Klaas <>
Subject Re: possible TermInfosReader speedup
Date Thu, 09 Apr 2009 18:51:43 GMT

On 8-Apr-09, at 11:13 PM, Michael Busch wrote:
> I was thinking about doing this as part of LUCENE-1195. However, I  
> doubt that the net win will be very noticeable here. A common  
> scenario is that you have an index with one big body field that has  
> a lot of unique terms, plus several metafields that contribute less  
> unique terms. Even if all metafields together would contribute the  
> same amount of additional unique terms as the body field, this  
> proposed change would only save one term comparison per body term  
> lookup. The reason is the O(log(n)) of the in-memory binary search.
> The story is a bit different for looking up terms on the smaller  
> metafields. Here you could probably save more term comparisons. But  
> I still think the improvement here would at the end be in the noise.  
> I mean how long do e.g. 30 in-memory term comparisons take compared  
> to all the disk seeks, sequential I/Os, VInts decodings, etc. that  
> every search needs to do? And you probably never have more than 2^30  
> unique terms in your index.
> So I doubt this improvement will be noticeable, but I would be happy  
> if you proved me wrong and this was indeed a long hanging fruit.

I had a similar thought.  It is hard to improve logarithmic  
algorithms, since you need to reduce N exponentially to get a linear  
speed up.  OTOH, I have a few indices with several fields with  
multiple-millions of terms each (I don't think that is uncommon.  Even  
indexing a docId per doc can cause this kind of situation).   Also, in  
your scenario, even if the main term search isn't much accelerated,  
the metadata field lookups might be much faster, which could add up to  
some sort of win.

Like Michael, I'm doubtful, but see no reason not to try it!


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message