lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shaun Senecal <ssenecal.w...@gmail.com>
Subject Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution
Date Thu, 15 Oct 2009 21:42:52 GMT
At first I thought so, yes, but then I realised that the query I wanted to
execute was A | B | C | D and in reality I was executing (A | B) & (C | D).
I guess my unit tests were missing some cases and don't currently catch
this.



On Thu, Oct 15, 2009 at 11:59 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> You should be able to do exactly what you were doing on 2.4, right?
> (By setting the rewrite method).
>
> Mike
>
> On Thu, Oct 15, 2009 at 8:30 AM, Shaun Senecal <ssenecal.work@gmail.com>
> wrote:
> > Thanks for the explanation Mike.  It looks like I have no choice but to
> move
> > any queries which throw TooManyClauses to be Filters. Sadly, this means a
> > max query time of 6s under load unless I can find a way to rewrite the
> query
> > to span a Query and a Filter.
> >
> >
> > Thanks again
> >
> >
> >
> > On Thu, Oct 15, 2009 at 6:52 PM, Michael McCandless <
> > lucene@mikemccandless.com> wrote:
> >
> >> On Thu, Oct 15, 2009 at 4:57 AM, Shaun Senecal <ssenecal.work@gmail.com
> >
> >> wrote:
> >>
> >> > Up to Lucene 2.4, this has been working out for us.  However, in
> >> > Lucene 2.9 this breaks since rewrite() now returns a
> >> > ConstantScoreQuery.
> >>
> >> You can get back to the 2.4 behavior by calling
> >> prefixQuery.setRewriteMethod(prefixQuery.SCORING_BOOLEAN_QUERY_REWRITE)
> >> before calling rewrite().
> >>
> >> > Is there a way I can know that a ConstantScoreQuery will match at
> >> > least 1 term (if not, I dont want to add it to the BooleanQuery)?
> >>
> >> There is a new method in 2.9: MultiTermQuery.getTotalNumberOfTerms(),
> >> which returns how many terms were visited during rewrite.  Would that
> >> work?
> >>
> >> > My understanding is that Lucene will apply the Filter (C | D) first,
> >> > limiting the result set, then apply the Query (A | B).  Is this
> >> > correct?
> >>
> >> Actually the filter & query clauses are AND'd in a sort of leapfrog
> >> fashion, taking turns skipping up to the other's doc ID and only
> >> accepting a doc ID when they both skip to the same point.  But this
> >> (the mechanics of how Lucene takes a filter into account) is an
> >> implementation detail and is likely to change.
> >>
> >> > If so, the end result is essentially the query: (A | B) & (C | D)
> >>
> >> Except that C, D contribute no scoring information, if scoring
> >> matters.  If scoring doesn't matter, entirely (even for A, B), you
> >> should use a collector that does not call score() at all to save CPU.
> >>
> >> Mike
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message