lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1644) Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
Date Wed, 22 Jul 2009 10:49:14 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734070#action_12734070
] 

Uwe Schindler commented on LUCENE-1644:
---------------------------------------

The biggest problem is, that this caching gets completely wired with multi-segment indexes:
The rewriting is done on the top-level reader. In this case the boolean query would be built
and the terms cached. If there are too many terms, it creates a filter instance with the cached
terms.
The rewritten query is then executed against all sub-readers using the cached terms and a
fixed term enum. Normally this would create a docidset for the current index reader, the rewrite
did it for the top-level index reader -> the wron doc ids are returned and so on. So you
cannot reuse the collected terms from the rewrite operation in the getDocIdSet calls.

So please turn of this caching at all! As noted before, the important thing is, that the returned
filter by rewrite is stateless and should not know anythis about index readers. The idex reader
is passed in getDocIdSet any is different for non-optimized indexes.

You have seen no tests fail, because all RangeQuery tests use optimized indexes.

> Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-1644
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1644
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1644.patch, LUCENE-1644.patch
>
>
> When MultiTermQuery is used (via one of its subclasses, eg
> WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
> "constant score mode", which pre-builds a filter and then wraps that
> filter as a ConstantScoreQuery.
> If you don't set that, it instead builds a [potentially massive]
> BooleanQuery with one SHOULD clause per term.
> There are some limitations of this approach:
>   * The scores returned by the BooleanQuery are often quite
>     meaningless to the app, so, one should be able to use a
>     BooleanQuery yet get constant scores back.  (Though I vaguely
>     remember at least one example someone raised where the scores were
>     useful...).
>   * The resulting BooleanQuery can easily have too many clauses,
>     throwing an extremely confusing exception to newish users.
>   * It'd be better to have the freedom to pick "build filter up front"
>     vs "build massive BooleanQuery", when constant scoring is enabled,
>     because they have different performance tradeoffs.
>   * In constant score mode, an OpenBitSet is always used, yet for
>     sparse bit sets this does not give good performance.
> I think we could address these issues by giving BooleanQuery a
> constant score mode, then empower MultiTermQuery (when in constant
> score mode) to pick & choose whether to use BooleanQuery vs up-front
> filter, and finally empower MultiTermQuery to pick the best (sparse vs
> dense) bit set impl.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message