lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: What's the purpose of hashing docid in BooleanScorer
Date Mon, 18 Oct 2004 21:04:20 GMT
Christoph Goller wrote:
> With the current scorer API one could get rid of buckettable and
> advance all subscores only by one document each time. I am not sure
> whether the bucketable implementation is really much more efficient.
> I only see the advantage of inlining some of the and
> score.score code.

Indeed, sub-scorers could be, e.g., kept in a priority queue.  This is 
done in ConjunctionScorer, PhraseScorer, etc.  However this adds a 
priority queue update to the inner search loop.  With long queries and 
with common terms this overhead can be significant.  With short queries 
and/or with rare terms the current ScoreTable-based implementation may 
indeed be slower, but I believe with longer queries containing common 
terms it is substantially faster.

This algorithm is described in:

If we had a priority-queue-based implementation then we could benchmark 
these.  If we found that one were faster than the other for particular 
classes of queries then we could have a query optimizer which 
automatically selects the most efficient implementation...


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message