lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] Commented: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment
Date Sun, 10 Oct 2010 11:56:30 GMT


Robert Muir commented on LUCENE-2690:

The big gain with this approach is you don't waste effort trying to
seek to non-existent terms in the sub readers. Normally the terms
cache would save you here, but, we never cache a miss and so when we
try to look that up again it's always a real (costly) seek.

With this approach we can disable using the terms cache entirely from
MTQ.rewrite, which is great.

This is the way to go because its horrible for the MTQ to touch the terms cache at all,
and depending on it for good performance is even worse.

I think if you somehow changed the benchmark to use multiple threads and had different
queries executing at the same time, you would see these guys fighting each other
over huge amounts of terms with df=1 and slowing each other down... but we wouldnt
have this problem with them rewriting to FakeQuery

> Do MultiTermQuery boolean rewrites per segment
> ----------------------------------------------
>                 Key: LUCENE-2690
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>         Attachments: LUCENE-2690-hack.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch,
LUCENE-2690.patch, LUCENE-2690.patch
> MultiTermQuery currently rewrites FuzzyQuery (using TopTermsBooleanQueryRewrite), the
auto constant rewrite method and the ScoringBQ rewrite methods using a MultiFields wrapper
on the top-level reader. This is inefficient.
> This patch changes the rewrite modes to do the rewrites per segment and uses some additional
datastructures (hashed sets/maps) to exclude duplicate terms. All tests currently pass, but
FuzzyQuery's tests should not, because it depends for the minimum score handling, that the
terms are collected in order..
> Robert will fix FuzzyQuery in this issue, too. This patch is just a start.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message