lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution
Date Fri, 16 Oct 2009 09:13:15 GMT
Super!

Mike

On Fri, Oct 16, 2009 at 4:06 AM, Shaun Senecal <ssenecal.work@gmail.com> wrote:
> Thanks Mike.  The queries are now running faster than they ever were before,
> and are returning the expected results!
>
>
> On Fri, Oct 16, 2009 at 7:39 AM, Shaun Senecal <ssenecal.work@gmail.com>wrote:
>
>> Ah!  I thought that the ConstantScoreQuery would also be rewritten into a
>> BooleanQuery, resulting in the same exception.  If that's the case, then
>> this should work.  I'll give that a try when I get into the office this
>> morning.
>>
>>
>>
>> On Fri, Oct 16, 2009 at 6:46 AM, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>>> Well, you could wrap the C | D filter as a Query (using
>>> ConstantScoreQuery), and then add that as a SHOULD clause on your
>>> toplevel BooleanQuery?
>>>
>>> Mike
>>>
>>> On Thu, Oct 15, 2009 at 5:42 PM, Shaun Senecal <ssenecal.work@gmail.com>
>>> wrote:
>>> > At first I thought so, yes, but then I realised that the query I wanted
>>> to
>>> > execute was A | B | C | D and in reality I was executing (A | B) & (C
|
>>> D).
>>> > I guess my unit tests were missing some cases and don't currently catch
>>> > this.
>>> >
>>> >
>>> >
>>> > On Thu, Oct 15, 2009 at 11:59 PM, Michael McCandless <
>>> > lucene@mikemccandless.com> wrote:
>>> >
>>> >> You should be able to do exactly what you were doing on 2.4, right?
>>> >> (By setting the rewrite method).
>>> >>
>>> >> Mike
>>> >>
>>> >> On Thu, Oct 15, 2009 at 8:30 AM, Shaun Senecal <
>>> ssenecal.work@gmail.com>
>>> >> wrote:
>>> >> > Thanks for the explanation Mike.  It looks like I have no choice
but
>>> to
>>> >> move
>>> >> > any queries which throw TooManyClauses to be Filters. Sadly, this
>>> means a
>>> >> > max query time of 6s under load unless I can find a way to rewrite
>>> the
>>> >> query
>>> >> > to span a Query and a Filter.
>>> >> >
>>> >> >
>>> >> > Thanks again
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Thu, Oct 15, 2009 at 6:52 PM, Michael McCandless <
>>> >> > lucene@mikemccandless.com> wrote:
>>> >> >
>>> >> >> On Thu, Oct 15, 2009 at 4:57 AM, Shaun Senecal <
>>> ssenecal.work@gmail.com
>>> >> >
>>> >> >> wrote:
>>> >> >>
>>> >> >> > Up to Lucene 2.4, this has been working out for us.  However,
in
>>> >> >> > Lucene 2.9 this breaks since rewrite() now returns a
>>> >> >> > ConstantScoreQuery.
>>> >> >>
>>> >> >> You can get back to the 2.4 behavior by calling
>>> >> >>
>>> prefixQuery.setRewriteMethod(prefixQuery.SCORING_BOOLEAN_QUERY_REWRITE)
>>> >> >> before calling rewrite().
>>> >> >>
>>> >> >> > Is there a way I can know that a ConstantScoreQuery will
match at
>>> >> >> > least 1 term (if not, I dont want to add it to the BooleanQuery)?
>>> >> >>
>>> >> >> There is a new method in 2.9:
>>> MultiTermQuery.getTotalNumberOfTerms(),
>>> >> >> which returns how many terms were visited during rewrite.  Would
>>> that
>>> >> >> work?
>>> >> >>
>>> >> >> > My understanding is that Lucene will apply the Filter
(C | D)
>>> first,
>>> >> >> > limiting the result set, then apply the Query (A | B).
 Is this
>>> >> >> > correct?
>>> >> >>
>>> >> >> Actually the filter & query clauses are AND'd in a sort
of leapfrog
>>> >> >> fashion, taking turns skipping up to the other's doc ID and
only
>>> >> >> accepting a doc ID when they both skip to the same point.  But
this
>>> >> >> (the mechanics of how Lucene takes a filter into account) is
an
>>> >> >> implementation detail and is likely to change.
>>> >> >>
>>> >> >> > If so, the end result is essentially the query: (A | B)
& (C | D)
>>> >> >>
>>> >> >> Except that C, D contribute no scoring information, if scoring
>>> >> >> matters.  If scoring doesn't matter, entirely (even for A,
B), you
>>> >> >> should use a collector that does not call score() at all to
save
>>> CPU.
>>> >> >>
>>> >> >> Mike
>>> >> >>
>>> >> >>
>>> ---------------------------------------------------------------------
>>> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >> >>
>>> >> >>
>>> >> >
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >>
>>> >>
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message