lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Klaas <>
Subject Re: Performance Improvement for Search using PriorityQueue
Date Mon, 10 Dec 2007 20:24:47 GMT
On 10-Dec-07, at 12:11 PM, Shai Erera wrote:

> Actually, queries on large indexes are not necessarily I/O bound.  
> It depends
> on how much of the posting list is being read into memory at once.  
> I'm not
> that familiar with the inner-most of Lucene, but let's assume a  
> posting
> element takes 4 bytes for docId and 2 more bytes per position in a  
> document
> (that's without compression, I'm sure Lucene does some compression  
> on the
> doc Ids). So, I think I won't miss by much by guessing that at most a
> posting element takes 10 bytes. Which means that 1M posting  
> elements take
> 10MB (this is considered a very long posting list).
> Therefore if you read it into memory in chunks (16, 32, 64 KB),  
> most of the
> time the query spends in the CPU, computing the scores, PQ etc. The  
> real IO
> operations only involve reading fragments of the posting into  
> memory. In
> todays hardware, reading 10MB into memory is pretty fast.
> So I wouldn't be surprised here (unless I misunderstood you).

My experience is that queries against indices which haven't been  
warmed into the os disk cache to be many times slower (this is  
especially true if the prox file is used at all).

I initially assumed that you had cleared the os disk cache between  
the runs of the two algorithms, and were seeing a difference in  
uncached query performance.  I assume though from your comments that  
this isn't the case at all.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message