lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution
Date Thu, 15 Oct 2009 09:52:12 GMT
On Thu, Oct 15, 2009 at 4:57 AM, Shaun Senecal <ssenecal.work@gmail.com> wrote:

> Up to Lucene 2.4, this has been working out for us.  However, in
> Lucene 2.9 this breaks since rewrite() now returns a
> ConstantScoreQuery.

You can get back to the 2.4 behavior by calling
prefixQuery.setRewriteMethod(prefixQuery.SCORING_BOOLEAN_QUERY_REWRITE)
before calling rewrite().

> Is there a way I can know that a ConstantScoreQuery will match at
> least 1 term (if not, I dont want to add it to the BooleanQuery)?

There is a new method in 2.9: MultiTermQuery.getTotalNumberOfTerms(),
which returns how many terms were visited during rewrite.  Would that
work?

> My understanding is that Lucene will apply the Filter (C | D) first,
> limiting the result set, then apply the Query (A | B).  Is this
> correct?

Actually the filter & query clauses are AND'd in a sort of leapfrog
fashion, taking turns skipping up to the other's doc ID and only
accepting a doc ID when they both skip to the same point.  But this
(the mechanics of how Lucene takes a filter into account) is an
implementation detail and is likely to change.

> If so, the end result is essentially the query: (A | B) & (C | D)

Except that C, D contribute no scoring information, if scoring
matters.  If scoring doesn't matter, entirely (even for A, B), you
should use a collector that does not call score() at all to save CPU.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message