lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Busch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3328) Specialize BooleanQuery if all clauses are TermQueries
Date Thu, 21 Jul 2011 05:24:58 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068807#comment-13068807
] 

Michael Busch commented on LUCENE-3328:
---------------------------------------

Nice improvements!

I'm wondering if you considered having ConjunctionTermScorer use the terms' IDF values to
decide which iterator to advance when all are on the same docID? It should always be best
to pick the rarest term.

We've talked about doing that in the past, but it's hard to support this for any type of subclause,
because you'd have to add the ability to estimate the IDFs of possible subclauses.

But with this change it seems very feasible to try for BQs that only have TQ clauses.

> Specialize BooleanQuery if all clauses are TermQueries
> ------------------------------------------------------
>
>                 Key: LUCENE-3328
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3328
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 3.4, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3328.patch, LUCENE-3328.patch, LUCENE-3328.patch
>
>
> During work on LUCENE-3319 I ran into issues with BooleanQuery compared to PhraseQuery
in the exact case. If I disable scoring on PhraseQuery and bypass the position matching, essentially
doing a conjunction match, ExactPhraseScorer beats plain boolean scorer by 40% which is a
sizeable gain. I converted a ConjunctionScorer to use DocsEnum directly but still didn't get
all the 40% from PhraseQuery. Yet, it turned out with further optimizations this gets very
close to PhraseQuery. The biggest gain here came from converting the hand crafted loop in
ConjunctionScorer#doNext to a for loop which seems to be less confusing to hotspot. In this
particular case I think code specialization makes lots of sense since BQ with TQ is by far
one of the most common queries.
> I will upload a patch shortly

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message