lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Efficient filtering advise
Date Sun, 22 Nov 2009 17:37:57 GMT
Op zondag 22 november 2009 17:23:53 schreef Eran Sevi:
> Thanks for the tips.
> 
> I'm still using version 2.4 so I can't use MultiTermQueryWrapperFilter but
> I'll definitely try to re-group the the terms that are not changing in order
> to cache them.
> How can I join several such filters together?

There are various ways. OpenBitSet and OpenBitSetDISI can do this,
and there's also BooleanFilter and ChainedFilter in contrib.

> Using FieldCacheTermsFilter sounds promising. Fortunately it is a single
> value field (our unique doc id).


Regards,
Paul Elschot
 
> I'll consider very seriously moving to 2.9.1 in order to try it out and see
> if I can get so real gain from using it or maybe using TermsFilter from
> contrib.
> 
> 
> On Sun, Nov 22, 2009 at 6:10 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> 
> > Maybe this helps you, but read the docs, it will work only with
> > single-value-fields:
> >
> > http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/FieldC
> > acheTermsFilter.html<http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/FieldC%0AacheTermsFilter.html>
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> > > -----Original Message-----
> > > From: Eran Sevi [mailto:eransevi@gmail.com]
> > > Sent: Sunday, November 22, 2009 3:49 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Efficient filtering advise
> > >
> > > Hi,
> > >
> > > I have a need to filter my queries using a rather large subset of terms
> > > (can
> > > be 10K or even 50K).
> > > All these terms are sure to exist in the index so the number of results
> > > can
> > > be about the same number of terms in the filter.
> > > The terms are numbers but are not subsequent and are from a large set of
> > > possible values (so range queries are probably not good for me).
> > > The index itself is about 1M docs and running even a simple query with
> > > such
> > > a large filter takes a lot of time even if the number of results is only
> > a
> > > few hundred docs.
> > > It seems like the speed is affected by the length of the filter even if
> > > the
> > > number of results remains more or less the same, which is logical but not
> > > by
> > > such a large loss of performance as I'm experiencing (running the query
> > > with
> > > a 10K terms filter takes an average of 1s 187ms with 600 results while
> > > running it with a 50K terms filter takes an average of 5s 207ms with 1000
> > > results).
> > >
> > > Currently I'm using a QueryFilter with a boolean query in which I "OR"
> > the
> > > different terms together.
> > > I also can't use a cached filter efficiently since the terms to filter on
> > > change almost every query.
> > >
> > > I was wondering if there's a better way to filter my queries so they
> > won't
> > > take a few seconds to run?
> > >
> > > Thanks in advance for any advise,
> > > Eran.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message