lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shaun Senecal <ssenecal.w...@gmail.com>
Subject Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution
Date Fri, 16 Oct 2009 08:06:29 GMT
Thanks Mike.  The queries are now running faster than they ever were before,
and are returning the expected results!


On Fri, Oct 16, 2009 at 7:39 AM, Shaun Senecal <ssenecal.work@gmail.com>wrote:

> Ah!  I thought that the ConstantScoreQuery would also be rewritten into a
> BooleanQuery, resulting in the same exception.  If that's the case, then
> this should work.  I'll give that a try when I get into the office this
> morning.
>
>
>
> On Fri, Oct 16, 2009 at 6:46 AM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> Well, you could wrap the C | D filter as a Query (using
>> ConstantScoreQuery), and then add that as a SHOULD clause on your
>> toplevel BooleanQuery?
>>
>> Mike
>>
>> On Thu, Oct 15, 2009 at 5:42 PM, Shaun Senecal <ssenecal.work@gmail.com>
>> wrote:
>> > At first I thought so, yes, but then I realised that the query I wanted
>> to
>> > execute was A | B | C | D and in reality I was executing (A | B) & (C |
>> D).
>> > I guess my unit tests were missing some cases and don't currently catch
>> > this.
>> >
>> >
>> >
>> > On Thu, Oct 15, 2009 at 11:59 PM, Michael McCandless <
>> > lucene@mikemccandless.com> wrote:
>> >
>> >> You should be able to do exactly what you were doing on 2.4, right?
>> >> (By setting the rewrite method).
>> >>
>> >> Mike
>> >>
>> >> On Thu, Oct 15, 2009 at 8:30 AM, Shaun Senecal <
>> ssenecal.work@gmail.com>
>> >> wrote:
>> >> > Thanks for the explanation Mike.  It looks like I have no choice but
>> to
>> >> move
>> >> > any queries which throw TooManyClauses to be Filters. Sadly, this
>> means a
>> >> > max query time of 6s under load unless I can find a way to rewrite
>> the
>> >> query
>> >> > to span a Query and a Filter.
>> >> >
>> >> >
>> >> > Thanks again
>> >> >
>> >> >
>> >> >
>> >> > On Thu, Oct 15, 2009 at 6:52 PM, Michael McCandless <
>> >> > lucene@mikemccandless.com> wrote:
>> >> >
>> >> >> On Thu, Oct 15, 2009 at 4:57 AM, Shaun Senecal <
>> ssenecal.work@gmail.com
>> >> >
>> >> >> wrote:
>> >> >>
>> >> >> > Up to Lucene 2.4, this has been working out for us.  However,
in
>> >> >> > Lucene 2.9 this breaks since rewrite() now returns a
>> >> >> > ConstantScoreQuery.
>> >> >>
>> >> >> You can get back to the 2.4 behavior by calling
>> >> >>
>> prefixQuery.setRewriteMethod(prefixQuery.SCORING_BOOLEAN_QUERY_REWRITE)
>> >> >> before calling rewrite().
>> >> >>
>> >> >> > Is there a way I can know that a ConstantScoreQuery will match
at
>> >> >> > least 1 term (if not, I dont want to add it to the BooleanQuery)?
>> >> >>
>> >> >> There is a new method in 2.9:
>> MultiTermQuery.getTotalNumberOfTerms(),
>> >> >> which returns how many terms were visited during rewrite.  Would
>> that
>> >> >> work?
>> >> >>
>> >> >> > My understanding is that Lucene will apply the Filter (C |
D)
>> first,
>> >> >> > limiting the result set, then apply the Query (A | B).  Is
this
>> >> >> > correct?
>> >> >>
>> >> >> Actually the filter & query clauses are AND'd in a sort of
leapfrog
>> >> >> fashion, taking turns skipping up to the other's doc ID and only
>> >> >> accepting a doc ID when they both skip to the same point.  But
this
>> >> >> (the mechanics of how Lucene takes a filter into account) is an
>> >> >> implementation detail and is likely to change.
>> >> >>
>> >> >> > If so, the end result is essentially the query: (A | B) &
(C | D)
>> >> >>
>> >> >> Except that C, D contribute no scoring information, if scoring
>> >> >> matters.  If scoring doesn't matter, entirely (even for A, B),
you
>> >> >> should use a collector that does not call score() at all to save
>> CPU.
>> >> >>
>> >> >> Mike
>> >> >>
>> >> >>
>> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >> >>
>> >> >>
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message