lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Fast access to a random page of the search results.
Date Tue, 01 Mar 2005 22:04:15 GMT
Daniel Naber wrote:
> After fixing this I can reproduce the problem with a local index that 
> contains about 220.000 documents (700MB). Fetching the first document 
> takes for example 30ms, fetching the last one takes >100ms. Of course I 
> tested this with a query that returns many results (about 50.000). 
> Actually it happens even with the default sorting, no need to sort by some 
> specific field.

In part this is due to the fact that Hits first searches for the 
top-scoring 100 documents.  Then, if you ask for a hit after that, it 
must re-query.  In part this is also due to the fact that maintaining a 
queue of the top 50k hits is more expensive than maintaining a queue of 
the top 100 hits, so the second query is slower.  And in part this could 
be caused by other things, such as that the highest ranking document 
might tend to be cached and not require disk io.

One could perform profiling to determine which is the largest factor. 
Of these, only the first is really fixable: if you know you'll need hit 
50k then you could tell this to Hits and have it perform only a single 
query.  But the algorithmic cost of keeping the queue of the top 50k is 
the same as collecting all the hits and sorting them.  So, in part, 
getting hits 49,990 through 50,000 is inherently slower than getting 
hits 0-10.  We can minimize that, but not eliminate it.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message