lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: [jira] [Commented] (LUCENE-4571) speedup disjunction with minShouldMatch
Date Fri, 23 Nov 2012 16:18:51 GMT
On Fri, Nov 23, 2012 at 8:00 AM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> Robert, am I right that stablishing the perf test is the first necessary
> step, rather than the implementation itself.
>
Right, the best way to do this is to extend luceneutil (
http://code.google.com/a/apache-extras.org/p/luceneutil) to test this case.

Keep in mind that I'd also be interested to see how BooleanScorer compares
to BooleanScorer2 for this situation. I already mentioned on the solr list
(nobody replied) that solr *never* gets BooleanScorer, but from time to
time I hear solr users complaining about BooleanScorer2's performance for
min-should-match

So when trying to improve the performance of min-should-match, I think a
very early step should be to see if we already have a better performing
alternative that is just not being used: if thats the case then the best
solution is to fix Solr's collectors to be able to cope with BooleanScorer.

Intuitively I think its going to be like everything else, BS1 is better in
some situations, BS2 in others.

>  Also, (don't really important but let me mention) what I'm really looking
> for is the disjunction query with an user supplied verification strategy,
> where minShouldMatch is just one of the way to verify match.
>
I don't think our concrete scorers should have such a hook: they should be
as dead simple as possible.

If you want to do this, I recommend just extending the abstract
DisjunctionScorer (Currently DisjunctionSum and DisjunctionMax extend this,
as I suggested we should think about splitting out a MinShouldMatchScorer
as well: its confusing that pure disjunctions are all mixed up with
min-should-match and the algorithms should actually work differently).

Mime
View raw message