lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Busch (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3328) Specialize BooleanQuery if all clauses are TermQueries
Date Thu, 21 Jul 2011 16:43:57 GMT


Michael Busch commented on LUCENE-3328:

The ConjunctionTermScorer sorts the DocsEnums by their frequency in the ctor. The leader will
always be the lowest frequent term in the set. is this what you mean here?

Cool, yeah that's roughly what I meant. In general, it's best to always pick the lowest-df
enum as leader:
1) after initialization
2) after a hit was found
3) whenever a doc matched m out of n enums, 1 < m < n

I think what you described covers situation 1), does it also cover 2) and 3)?

> Specialize BooleanQuery if all clauses are TermQueries
> ------------------------------------------------------
>                 Key: LUCENE-3328
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 3.4, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>         Attachments: LUCENE-3328.patch, LUCENE-3328.patch, LUCENE-3328.patch
> During work on LUCENE-3319 I ran into issues with BooleanQuery compared to PhraseQuery
in the exact case. If I disable scoring on PhraseQuery and bypass the position matching, essentially
doing a conjunction match, ExactPhraseScorer beats plain boolean scorer by 40% which is a
sizeable gain. I converted a ConjunctionScorer to use DocsEnum directly but still didn't get
all the 40% from PhraseQuery. Yet, it turned out with further optimizations this gets very
close to PhraseQuery. The biggest gain here came from converting the hand crafted loop in
ConjunctionScorer#doNext to a for loop which seems to be less confusing to hotspot. In this
particular case I think code specialization makes lots of sense since BQ with TQ is by far
one of the most common queries.
> I will upload a patch shortly

This message is automatically generated by JIRA.
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message