lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: [jira] Commented: (LUCENE-1461) Cached filter for a single term field
Date Fri, 21 Nov 2008 19:55:18 GMT

Actually, the purpose of StringIndex is to reduce "sort by string" to  
"sort by int", for exactly the reason you said (compareTo is costly  
for String).

Ie StringIndex computes the ordinal for every doc in the index, so  
sorting by string value reduces to sorting int ordinals.

I think it's the same thing that DisjointMultiFilter is doing?  Both  
StringIndex and DisjointMultiFilter map Term Text (String) -> ord  
(int) as well as docID -> ord.

I do like the idea of using N-bit packing for the docID -> ord map.


Jason Rutherglen wrote:

> The problem with StringIndex is that it uses strings which are  
> costly for compareTo juxtaposed with numeric compare (used  
> juxtaposed because originally had "compared with" which was  
> redundant).  It seems helpful to have generic primitive based  
> StringIndex classes.
> On Fri, Nov 21, 2008 at 2:34 AM, Michael McCandless (JIRA) < 
> > wrote:
>    [

> #action_12649639 ]
> Michael McCandless commented on LUCENE-1461:
> --------------------------------------------
> It seems like the core class here (DisjointMultiFilter) is doing the  
> same thing as FieldCache's StringIndex?  Ie, it builds a data  
> structure that maps String <-> ord and docID -> ord.  So maybe we  
> can merge DisjointMultiFilter into the FieldCache API.
> And then RangeMultiFilter is a great addition for quickly "spawning"  
> numerous new RangeFilters, having pulled & stored the StringIndex  
> from the FieldCache?  So I think it should live in core  
>*?  I'd prefer a different name  
> (RangeMultiFilter implies it can filter over multiple ranges) but  
> can't think of one.  Or maybe we absorb it into RangeFilter, as a  
> different "rewrite" method like "useFieldCache=true|false"?
> > Cached filter for a single term field
> > -------------------------------------
> >
> >                 Key: LUCENE-1461
> >                 URL:
> >             Project: Lucene - Java
> >          Issue Type: New Feature
> >            Reporter: Tim Sturge
> >         Attachments:,  
> >
> >
> > These classes implement inexpensive range filtering over a field  
> containing a single term. They do this by building an integer array  
> of term numbers (storing the term->number mapping in a TreeMap) and  
> then implementing a fast integer comparison based DocSetIdIterator.
> > This code is currently being used to do age range filtering, but  
> could also be used to do other date filtering or in any application  
> where there need to be multiple filters based on the same single  
> term field. I have an untested implementation of single term  
> filtering and have considered but not yet implemented term set  
> filtering (useful for location based searches) as well.
> > The code here is fairly rough; it works but lacks javadocs and  
> toString() and hashCode() methods etc. I'm posting it here to  
> discover if there is other interest in this feature; I don't mind  
> fixing it up but would hate to go to the effort if it's not going to  
> make it into Lucene.
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message