lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4571) speedup disjunction with minShouldMatch
Date Fri, 22 Mar 2013 13:57:16 GMT


Robert Muir commented on LUCENE-4571:

Thanks Stefan! I can confirm its passing tests and I'm seeing similar benchmark results (on
iteration #9 now)

the benchmark takes some time on my computer: I'll report back what I see when its complete.

There are still some assertions failing, but for now I'd attribute them to problems with the
test (are return statements is missing when actual==null?), because these assertions only
fail within the baseline/expected impl. Lucenebench doesn't complain anymore, even when checking

Awesome! Thats definitely a test bug in assertNext()... its just very unlikely to happen (means
that by rare chance, no documents got term "j" or whatever): so it hasn't tripped up in Jenkins.
I'll fix this, thanks!

Instead of re-implementing all of the previous basic implementation, do you think it would
be possible to rather re-use a normal DisjunctionSumScorer and wrap it with a Scorer that
discards candidates that don't match the mm-constraint? This way we would get correct scoring
for free. E.g.

We could: I thought about this. However I did this slow one for a few reasons:
* doesn't prevent any refactoring of core/ code (its a standalone thing just for testing)
* should make things pretty easy to debug (this isnt yet done though)
* possibility of finding bugs in the current implementation. especially as we near a bugfix
release :)

The correct scoring is actually easy to add. But I didnt do this yet because I didnt want
to hold up this issue on "the perfect test". To me the important thing here is matching the
correct documents according to minNrShouldMatch. If this is correct, its pretty intuitive
to me that scoring is correct too: wherever we increase nrMatchers there should also be a
addition to the score.

But I'll improve this today so we all have that warm fuzzy feeling!

As far as I'm concerned this patch is ready to be committed today. I'll follow up with my
benchmark results in a comment and improve 
the tests a bit this morning though.

Thanks a lot for taking this on... awesome work.
> speedup disjunction with minShouldMatch 
> ----------------------------------------
>                 Key: LUCENE-4571
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 4.1
>            Reporter: Mikhail Khludnev
>         Attachments: LUCENE-4571.patch, LUCENE-4571.patch, LUCENE-4571.patch, LUCENE-4571.patch,
LUCENE-4571.patch, LUCENE-4571.patch
> even minShouldMatch is supplied to DisjunctionSumScorer it enumerates whole disjunction,
and verifies minShouldMatch condition [on every doc|]:
> {code}
>   public int nextDoc() throws IOException {
>     assert doc != NO_MORE_DOCS;
>     while(true) {
>       while (subScorers[0].docID() == doc) {
>         if (subScorers[0].nextDoc() != NO_MORE_DOCS) {
>           heapAdjust(0);
>         } else {
>           heapRemoveRoot();
>           if (numScorers < minimumNrMatchers) {
>             return doc = NO_MORE_DOCS;
>           }
>         }
>       }
>       afterNext();
>       if (nrMatchers >= minimumNrMatchers) {
>         break;
>       }
>     }
>     return doc;
>   }
> {code}
> [~spo] proposes (as well as I get it) to pop nrMatchers-1 scorers from the heap first,
and then push them back advancing behind that top doc. For me the question no.1 is there a
performance test for minShouldMatch constrained disjunction. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message