lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eran Sevi <erans...@gmail.com>
Subject Re: Efficient filtering advise
Date Sun, 22 Nov 2009 16:41:26 GMT
I think it shouldn't take X5 times longer since the number of results is
only about X2 times larger (and much smaller than the number of terms in the
filter), but maybe I'm wrong here since I'm not familiar with the filter
internals.

Unfortunately, the time to construct the filter is mere milliseconds.
almost all of the time (~5secs) are spent in the search method.
I'm using a collector to retrieve all the results (and fetch a value for
some fields) but without the filter this also takes less then a second for
the same number of results.

On Sun, Nov 22, 2009 at 5:57 PM, Erick Erickson <erickerickson@gmail.com>wrote:

> Hmmm, I'm not very clear here. Are you saying that you effectively
> form 10-50K filters and OR them all together? That would be
> consistent with the 50K case taking approx. 5X a long as the 10K
> case.....
>
> Do you know where in your code the time is being spent? That'd
> be a big help in suggesting alternatives. If I'm on the right track,
> I'd expect the time to be spent assembling the filters.....
>
> Not much help here, but I'm having trouble wrapping my head
> around this...
>
> Best
> Erick
>
> On Sun, Nov 22, 2009 at 9:48 AM, Eran Sevi <eransevi@gmail.com> wrote:
>
> > Hi,
> >
> > I have a need to filter my queries using a rather large subset of terms
> > (can
> > be 10K or even 50K).
> > All these terms are sure to exist in the index so the number of results
> can
> > be about the same number of terms in the filter.
> > The terms are numbers but are not subsequent and are from a large set of
> > possible values (so range queries are probably not good for me).
> > The index itself is about 1M docs and running even a simple query with
> such
> > a large filter takes a lot of time even if the number of results is only
> a
> > few hundred docs.
> > It seems like the speed is affected by the length of the filter even if
> the
> > number of results remains more or less the same, which is logical but not
> > by
> > such a large loss of performance as I'm experiencing (running the query
> > with
> > a 10K terms filter takes an average of 1s 187ms with 600 results while
> > running it with a 50K terms filter takes an average of 5s 207ms with 1000
> > results).
> >
> > Currently I'm using a QueryFilter with a boolean query in which I "OR"
> the
> > different terms together.
> > I also can't use a cached filter efficiently since the terms to filter on
> > change almost every query.
> >
> > I was wondering if there's a better way to filter my queries so they
> won't
> > take a few seconds to run?
> >
> > Thanks in advance for any advise,
> > Eran.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message