lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment
Date Thu, 14 Oct 2010 11:53:32 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920930#action_12920930
] 

Uwe Schindler edited comment on LUCENE-2690 at 10/14/10 7:52 AM:
-----------------------------------------------------------------

This is the attributes hell patch (not yet finally done on the FuzzyTermsEnum side, Robert
can you review?).

The change is:
- BoostAttribute is only added to the TermsEnum, because the TermsEnum produces the boost,
the MTQ rewrite consumes.
- MaxNonCompetitiveBoostAttribute is owned by the rewrite mode as it is the producer. The
TermsEnum consunmes this attribute

Fixing needs the hackish attributes() method in the Fuzzy rewrite.

TODO: Contrib/Solr is not yet reviewed for the API change in MTQ.getTermsEnum()!

      was (Author: thetaphi):
    This is the attributes hell patch (not yet finally done on the FuzzyTermsEnum side, Robert
can you review?).

The change is:
- BoostAttribute is only added to the TermsEnum, because the TermsEnum produces the boost,
the MTQ rewrite consumes.
- MaxNonCompetitiveBoostAttribute is owned by the rewrite mode as it is the producer. The
TermsEnum consunmes this attribute

Fixing needs the hackish attributes() method in the Fuzzy rewrite.
  
> Do MultiTermQuery boolean rewrites per segment
> ----------------------------------------------
>
>                 Key: LUCENE-2690
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2690
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2690-attributes.patch, LUCENE-2690-hack.patch, LUCENE-2690.patch,
LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch,
LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch
>
>
> MultiTermQuery currently rewrites FuzzyQuery (using TopTermsBooleanQueryRewrite), the
auto constant rewrite method and the ScoringBQ rewrite methods using a MultiFields wrapper
on the top-level reader. This is inefficient.
> This patch changes the rewrite modes to do the rewrites per segment and uses some additional
datastructures (hashed sets/maps) to exclude duplicate terms. All tests currently pass, but
FuzzyQuery's tests should not, because it depends for the minimum score handling, that the
terms are collected in order..
> Robert will fix FuzzyQuery in this issue, too. This patch is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message