lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Updated: (LUCENE-2686) DisjunctionSumScorer should not call .score on sub scorers until consumer calls .score
Date Sat, 09 Oct 2010 13:01:32 GMT


Michael McCandless updated LUCENE-2686:

    Attachment: LUCENE-2686.patch

So... the good news is I made a new scorer (basically copied DisjunctionMaxScorer and then
tweaked from there) that scores the OR-only case.  All tests pass w/ this new scorer.

And more good news is that if you don't score (I sort by doctitle to do that), you get a speedup
-- 7.7% in my simplistic test (prefix query unit*, expands to 988 terms, but I force it to
do a scoring BQ rewrite, plus force it to use BS2 not BS -- the nocommits in the patch).

But the bad news is with scoring on it's 22.7% slower!

And, the weird news is, I discovered accidentally that BS2 is much (> 2X) faster for this
one query.  I think we need to modify the criteria that decides whether to use BS or BS2...
 maybe when there are lots of lowish-docFreq terms, BS2 is better?

> DisjunctionSumScorer should not call .score on sub scorers until consumer calls .score
> --------------------------------------------------------------------------------------
>                 Key: LUCENE-2686
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.1, 4.0
>         Attachments: LUCENE-2686.patch,
> Spinoff from java-user thread "question about Scorer.freq()" from Koji...
> BooleanScorer2 uses DisjunctionSumScorer to score only-SHOULD-clause boolean queries.
> But, this scorer does too much work for collectors that never call .score, because it
scores while it's matching.  It should only call .score on the subs when the caller calls
its .score.
> This also has the side effect of messing up advanced collectors that gather the freq()
of the subs (using LUCENE-2590).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message