lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-1461) Cached filter for a single term field
Date Fri, 26 Jun 2009 15:25:07 GMT


Michael McCandless commented on LUCENE-1461:

bq. This problem has also NumericRangeQuery (see the TermEnum impl there). I could change
both queries to simply return the empty iterator (like when upper<lower)

Right, and I see you've already fixed it!

>From your performance runs, looking at the average times, forcing this
filter to take deletions into account made it ~2X slower.  That's
quite costly.

(Though, you really should seed the Random() so the two tests run
precisely the same set of queries against precisely the same index).

I would imagine that for most usage of this filter, taking deletes
into account is not necessary, because it's being used as a filter
with a query whose scorer won't return deleted docs.  Then we've taken
this perf hit for nothing...

Somehow, we really need better control, when creating scorers, on just
when we need and don't need deletions / filters to be "AND'd" in.

Also, this filter isn't good when not many docs pass the filter, since
it's an O(N) scan through the index.  Trie should do much better in
those cases.

I wonder, if we could make a hybrid approach that eg loads the trie
fields into a fast in-memory postings format (simple int arrays), just
how much faster it'd be.  Ie, if you want to spend memory, spending it
on trie's postings would presumably net the best performance.  Once we
have flexible indexing we could presumably "swap in" an in-RAM
postings impl and then run trie against that.

> Cached filter for a single term field
> -------------------------------------
>                 Key: LUCENE-1461
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Tim Sturge
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>         Attachments:, FieldCacheRangeFilter.patch, LUCENE-1461.patch,
LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461a.patch,
LUCENE-1461b.patch, LUCENE-1461c.patch,,,,, TestFieldCacheRangeFilter.patch
> These classes implement inexpensive range filtering over a field containing a single
term. They do this by building an integer array of term numbers (storing the term->number
mapping in a TreeMap) and then implementing a fast integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but could also be used to
do other date filtering or in any application where there need to be multiple filters based
on the same single term field. I have an untested implementation of single term filtering
and have considered but not yet implemented term set filtering (useful for location based
searches) as well. 
> The code here is fairly rough; it works but lacks javadocs and toString() and hashCode()
methods etc. I'm posting it here to discover if there is other interest in this feature; I
don't mind fixing it up but would hate to go to the effort if it's not going to make it into

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message