lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eks dev <>
Subject Re: Combining search steps without re-searching
Date Mon, 28 Aug 2006 21:17:03 GMT
you are right Chuck, it depends... Filters are great for fields with small cardinality (majority
of terms in normal collection) or things that are sorted (assuming Paul's patch gets commited
so we do not use BitSet and we could use less memory hungry structures like interval lists
:) With BitSet, paradoxically it makes sense to use them for high freq. terms to save memory

Hi commiters, any chance of getting rid of BitSet in Filter? Can somebody guide what else
needs to be done to have it commited, we have a pair of hands to help... 

----- Original Message ----
From: Chuck Williams <>
Sent: Monday, 28 August, 2006 10:51:40 PM
Subject: Re: Combining search steps without re-searching

Andrzej Bialecki wrote on 08/28/2006 09:19 AM:
> Chuck Williams wrote:
>> I presume your search steps are anded, as in typical drill-downs?
>> >From  a Lucene standpoint, each sequence of steps is a BooleanQuery of
>> required clauses, one for each step.  To add a step, you extend the
>> BooleanQuery with a new clause.  To not re-evaluate the full query,
> ... umm, guys, wouldn't a series of QueryFilter's work much better in
> this case? If some of the clauses are repeatable, then filtering
> results through a cached BitSet in such filtered query would work
> nicely, right?
If the possible initial steps comprise a small finite set, I could see
that as a winner.  In my app for instance, the drill-down selectors are
dynamic and drawn from a large set of possibilities.  It's hard to see
how any small set of filters would be much of a benefit.  A large set of
filters would consume too much space.  For a 10 million document node at
1.25 megabytes per filter even a couple hundred filters adds up to
something significant.

As I understand things, filters take considerably more time to initially
create but then can more than make this up through repetitive use.  So
they are a winner iff there are a small number of specific steps that
are frequently and disproportionately used.


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message