lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eran Sevi <erans...@gmail.com>
Subject Efficient filtering advise
Date Sun, 22 Nov 2009 14:48:39 GMT
Hi,

I have a need to filter my queries using a rather large subset of terms (can
be 10K or even 50K).
All these terms are sure to exist in the index so the number of results can
be about the same number of terms in the filter.
The terms are numbers but are not subsequent and are from a large set of
possible values (so range queries are probably not good for me).
The index itself is about 1M docs and running even a simple query with such
a large filter takes a lot of time even if the number of results is only a
few hundred docs.
It seems like the speed is affected by the length of the filter even if the
number of results remains more or less the same, which is logical but not by
such a large loss of performance as I'm experiencing (running the query with
a 10K terms filter takes an average of 1s 187ms with 600 results while
running it with a 50K terms filter takes an average of 5s 207ms with 1000
results).

Currently I'm using a QueryFilter with a boolean query in which I "OR" the
different terms together.
I also can't use a cached filter efficiently since the terms to filter on
change almost every query.

I was wondering if there's a better way to filter my queries so they won't
take a few seconds to run?

Thanks in advance for any advise,
Eran.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message