lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Khludnev (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4571) speedup disjunction with minShouldMatch
Date Fri, 23 Nov 2012 21:10:58 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503281#comment-13503281
] 

Mikhail Khludnev commented on LUCENE-4571:
------------------------------------------

It was a bad idea to reply to jira's mail. moving dialogue here:

[~mkhludnev]
{quote}
Robert, am I right that establishing the perf test is the first necessary step, rather than
the implementation itself.
Also, (don't really important but let me mention) what I'm really looking for is the disjunction
query with an user supplied verification strategy, where minShouldMatch is just one of the
way to verify match.
{quote}

[~rcmuir]
{quote}
Right, the best way to do this is to extend luceneutil (http://code.google.com/a/apache-extras.org/p/luceneutil)
to test this case.

Keep in mind that I'd also be interested to see how BooleanScorer compares to BooleanScorer2
for this situation. I already mentioned on the solr list (nobody replied) that solr *never*
gets BooleanScorer, but from time to time I hear solr users complaining about BooleanScorer2's
performance for min-should-match

So when trying to improve the performance of min-should-match, I think a very early step should
be to see if we already have a better performing alternative that is just not being used:
if thats the case then the best solution is to fix Solr's collectors to be able to cope with
BooleanScorer.

Intuitively I think its going to be like everything else, BS1 is better in some situations,
BS2 in others.

>>> Also, (don't really important but let me mention) what I'm really looking for
is the disjunction query with an user supplied verification strategy, where minShouldMatch
is just one of the way to verify match.

I don't think our concrete scorers should have such a hook: they should be as dead simple
as possible.

If you want to do this, I recommend just extending the abstract DisjunctionScorer (Currently
DisjunctionSum and DisjunctionMax extend this, as I suggested we should think about splitting
out a MinShouldMatchScorer as well: its confusing that pure disjunctions are all mixed up
with min-should-match and the algorithms should actually work differently).
{quote}


                
> speedup disjunction with minShouldMatch 
> ----------------------------------------
>
>                 Key: LUCENE-4571
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4571
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 4.1
>            Reporter: Mikhail Khludnev
>
> even minShouldMatch is supplied to DisjunctionSumScorer it enumerates whole disjunction,
and verifies minShouldMatch condition [on every doc|https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/DisjunctionSumScorer.java#L70]:
> {code}
>   public int nextDoc() throws IOException {
>     assert doc != NO_MORE_DOCS;
>     while(true) {
>       while (subScorers[0].docID() == doc) {
>         if (subScorers[0].nextDoc() != NO_MORE_DOCS) {
>           heapAdjust(0);
>         } else {
>           heapRemoveRoot();
>           if (numScorers < minimumNrMatchers) {
>             return doc = NO_MORE_DOCS;
>           }
>         }
>       }
>       afterNext();
>       if (nrMatchers >= minimumNrMatchers) {
>         break;
>       }
>     }
>     
>     return doc;
>   }
> {code}
> [~spo] proposes (as well as I get it) to pop nrMatchers-1 scorers from the heap first,
and then push them back advancing behind that top doc. For me the question no.1 is there a
performance test for minShouldMatch constrained disjunction. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message