lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Efficient filtering advise
Date Sun, 22 Nov 2009 15:57:02 GMT
Hmmm, I'm not very clear here. Are you saying that you effectively
form 10-50K filters and OR them all together? That would be
consistent with the 50K case taking approx. 5X a long as the 10K
case.....

Do you know where in your code the time is being spent? That'd
be a big help in suggesting alternatives. If I'm on the right track,
I'd expect the time to be spent assembling the filters.....

Not much help here, but I'm having trouble wrapping my head
around this...

Best
Erick

On Sun, Nov 22, 2009 at 9:48 AM, Eran Sevi <eransevi@gmail.com> wrote:

> Hi,
>
> I have a need to filter my queries using a rather large subset of terms
> (can
> be 10K or even 50K).
> All these terms are sure to exist in the index so the number of results can
> be about the same number of terms in the filter.
> The terms are numbers but are not subsequent and are from a large set of
> possible values (so range queries are probably not good for me).
> The index itself is about 1M docs and running even a simple query with such
> a large filter takes a lot of time even if the number of results is only a
> few hundred docs.
> It seems like the speed is affected by the length of the filter even if the
> number of results remains more or less the same, which is logical but not
> by
> such a large loss of performance as I'm experiencing (running the query
> with
> a 10K terms filter takes an average of 1s 187ms with 600 results while
> running it with a 50K terms filter takes an average of 5s 207ms with 1000
> results).
>
> Currently I'm using a QueryFilter with a boolean query in which I "OR" the
> different terms together.
> I also can't use a cached filter efficiently since the terms to filter on
> change almost every query.
>
> I was wondering if there's a better way to filter my queries so they won't
> take a few seconds to run?
>
> Thanks in advance for any advise,
> Eran.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message