lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Morus Walter <morus.wal...@tanto.de>
Subject Re: Numeric Range Restrictions: Queries vs Filters
Date Tue, 23 Nov 2004 08:26:31 GMT
Hoss writes:
> 
> (c) Filtering.  Filters in general make a lot of sense to me.  They are a
> way to specify (at query time) that only a certain subset of the index
> should be considered for results.  The Filter class has a very straight
> forward API that seems very easy to subclass to get the behavior I want.
> The Query API on the other hand ... I freely admit, that I can't make
> heads or tails out of it.  I don't even know where I would begin to try
> and write a new subclass of Query if I wanted to.
> 
> I would think that most people who want to do a "numeric range
> restriction" on their data, probably don't care about the Scoring benefits
> of RangeQuery.  Looking at the code base, the way DateFilter works seems
> like it provides an ideal solution to any sort of Range restriction (not
> just Dates) that *should* be more efficient then using RangeQuery when
> dealing with an unbounded value set. (Both approaches need to iterate over
> all of the terms in the specified field using TermEnum, but RangeQuery has
> to build up an set of BooleanQuery objects for each matching term, and
> then each of those queries have to help score the documents -- DateFilter
> on the other hand only has to maintain a single BitSet of documents that
> it finds as it iterates)
> 
IMO there's another option, at least as long as the number of your documents
isn't too high.
Sorting already creates a list of all field values for some field that 
will be used during the search for sorting.
Nothing prevents you from using that aproach for search restriction also.
The advantage is, that you can create that list once and use it for different
ranges until the index is changed whereas a filter can only represent
one range.
The disadvantate is, that you have to keep one value for each document in
memory instead of one bit in a filter.

I did that (before the sort code was introduced) for date queries in order
to be able to sort and restrict searches on dates.
But I haven't thought about how a general API for such a solution might look 
like so far.

Of course it depends on a number of questions, which way is preferable.
How often is the index modified, are range queries usually done for the
same or different ranges, how many documents are indexed and so on.

Morus
  

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message