lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4571) speedup disjunction with minShouldMatch
Date Mon, 26 Nov 2012 22:14:59 GMT


Robert Muir commented on LUCENE-4571:

I like the ideas Stefan!

Using BooleanScorer will probably only give improvement by a factor (in some cases), but also
this scorers would still generate all these useless candidates.

It might be hard to beat this one (which already supports minShouldMatch as a side effect
of its coord computation anyway i think) in cases with
lots of terms because I feel like inevitably supporting advance() on the subscorers will result
in more heap operations/cpu (at least when i messed around on paper it seemed this way).

In fact in some situations I think even this one should be used when there are mandatory clauses:
in any case booleanweight should have better heuristics to decide, limited by what the collector
can deal with.

Collectors that support out of order scoring today I think are likely a lot faster with these
minShouldMatch types of queries than BooleanScorer2. Thats why i brought it up: it could be
an easy win for e.g. solr users.

But when the number of terms is smallish and one is super-common, its silly we don't try to
improve the BS2 impl to at least try to handle the worst case :) 

At least for the sake of simplicity (not necessarily performance) I still think it would be
good to factor minshouldmatch behavior out of disjunctionsumscorer if we try to make it use
advance() on subscorers.

> speedup disjunction with minShouldMatch 
> ----------------------------------------
>                 Key: LUCENE-4571
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 4.1
>            Reporter: Mikhail Khludnev
> even minShouldMatch is supplied to DisjunctionSumScorer it enumerates whole disjunction,
and verifies minShouldMatch condition [on every doc|]:
> {code}
>   public int nextDoc() throws IOException {
>     assert doc != NO_MORE_DOCS;
>     while(true) {
>       while (subScorers[0].docID() == doc) {
>         if (subScorers[0].nextDoc() != NO_MORE_DOCS) {
>           heapAdjust(0);
>         } else {
>           heapRemoveRoot();
>           if (numScorers < minimumNrMatchers) {
>             return doc = NO_MORE_DOCS;
>           }
>         }
>       }
>       afterNext();
>       if (nrMatchers >= minimumNrMatchers) {
>         break;
>       }
>     }
>     return doc;
>   }
> {code}
> [~spo] proposes (as well as I get it) to pop nrMatchers-1 scorers from the heap first,
and then push them back advancing behind that top doc. For me the question no.1 is there a
performance test for minShouldMatch constrained disjunction. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message