lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: What's the purpose of hashing docid in BooleanScorer; DisjunctionScorer
Date Mon, 18 Oct 2004 21:16:01 GMT
On Monday 18 October 2004 23:04, Doug Cutting wrote:
> Christoph Goller wrote:
> > With the current scorer API one could get rid of buckettable and
> > advance all subscores only by one document each time. I am not sure
> > whether the bucketable implementation is really much more efficient.
> > I only see the advantage of inlining some of the and
> > score.score code.
> Indeed, sub-scorers could be, e.g., kept in a priority queue.  This is
> done in ConjunctionScorer, PhraseScorer, etc.  However this adds a
> priority queue update to the inner search loop.  With long queries and
> with common terms this overhead can be significant.  With short queries
> and/or with rare terms the current ScoreTable-based implementation may
> indeed be slower, but I believe with longer queries containing common
> terms it is substantially faster.
> This algorithm is described in:
> If we had a priority-queue-based implementation then we could benchmark
> these.  If we found that one were faster than the other for particular
> classes of queries then we could have a query optimizer which
> automatically selects the most efficient implementation...

I have a DisjunctionScorer based on a PriorityQueue lying around,
but I can't benchmark it myself at the moment. In case there is
interest, I'll gladly adapt it to and 
add it in bugzilla.

Paul Elschot

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message