lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <mkhlud...@griddynamics.com>
Subject Re: [jira] [Commented] (LUCENE-4571) speedup disjunction with minShouldMatch
Date Fri, 23 Nov 2012 16:00:32 GMT
Robert, am I right that stablishing the perf test is the first necessary
step, rather than the implementation itself.
Also, (don't really important but let me mention) what I'm really looking
for is the disjunction query with an user supplied verification strategy,
where minShouldMatch is just one of the way to verify match.
23.11.2012 19:50 пользователь "Robert Muir (JIRA)" <jira@apache.org>
написал:

>
>     [
> https://issues.apache.org/jira/browse/LUCENE-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503234#comment-13503234]
>
> Robert Muir commented on LUCENE-4571:
> -------------------------------------
>
> I agree we should try to use advance() with this scorer. If it has say 3
> terms and one is a very common
> term (e.g. stopword-type term), it will drag the entire query down.
>
> {quote}
> For me the question no.1 is there a performance test for minShouldMatch
> constrained disjunction.
> {quote}
>
> No, currently there is not in luceneutil.
>
> Also it would be good to think about splitting out ordinary disjunctions
> from disjunctions-with-minNrShouldMatch
> from booleanscorer2. The simple disjunction case could then be easily
> optimized more (e.g. defer score() until necessary
> and so on).
>
>
> > speedup disjunction with minShouldMatch
> > ----------------------------------------
> >
> >                 Key: LUCENE-4571
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-4571
> >             Project: Lucene - Core
> >          Issue Type: Improvement
> >          Components: core/search
> >    Affects Versions: 4.1
> >            Reporter: Mikhail Khludnev
> >
> > even minShouldMatch is supplied to DisjunctionSumScorer it enumerates
> whole disjunction, and verifies minShouldMatch condition [on every doc|
> https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/DisjunctionSumScorer.java#L70
> ]:
> > {code}
> >   public int nextDoc() throws IOException {
> >     assert doc != NO_MORE_DOCS;
> >     while(true) {
> >       while (subScorers[0].docID() == doc) {
> >         if (subScorers[0].nextDoc() != NO_MORE_DOCS) {
> >           heapAdjust(0);
> >         } else {
> >           heapRemoveRoot();
> >           if (numScorers < minimumNrMatchers) {
> >             return doc = NO_MORE_DOCS;
> >           }
> >         }
> >       }
> >       afterNext();
> >       if (nrMatchers >= minimumNrMatchers) {
> >         break;
> >       }
> >     }
> >
> >     return doc;
> >   }
> > {code}
> > [~spo] proposes (as well as I get it) to pop nrMatchers-1 scorers from
> the heap first, and then push them back advancing behind that top doc. For
> me the question no.1 is there a performance test for minShouldMatch
> constrained disjunction.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Mime
View raw message