lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] Updated: (LUCENE-1461) Cached filter for a single term field
Date Fri, 26 Jun 2009 22:23:47 GMT


Uwe Schindler updated LUCENE-1461:

    Attachment: LUCENE-1461.patch

Attached is a new patch, that has 2 DocIdSetIterator implementations, one with TermDocs, one
without. The TermDocs one is for numeric types only choosen, if the reader contains deletions
*and* 0 is inside the range. For all other cases (also StringIndex) the simple DocIdSetIterator
using the counter is used.

For more code-reuse, all range implementations now use the same abstract DocIdSet implementation
and only override matchDoc(). My tests showed, that use of this method does not affect performance
(method is inlined), the original stringindex impl is as fast as the new one with matchDoc().

This patch also restores the original handling of the return value of binarySearch (which
can be negative).

Here again the comparison:

*Version with TermDocs:*
loading field cache
time: 6767.23131 ms
Warming searcher...
avg number of terms: 378.75
TRIE: best time=5.232229 ms; worst time=553.791334 ms; avg=250.4418579 ms; sum=31996909
FIELDCACHE: best time=212.763912 ms; worst time=357.100414 ms; avg=279.75582110000005 ms;

*Version without (because index in testcase has no deletions):*
loading field cache
time: 6463.311678 ms
Warming searcher...
avg number of terms: 378.75
TRIE: best time=4.539963 ms; worst time=581.657446 ms; avg=246.58688465 ms; sum=31996909
FIELDCACHE: best time=64.747614 ms; worst time=211.557335 ms; avg=139.16517340000001 ms; sum=31996909

(my T60 was not on battery, because of this the measurement with TermDocs and FieldCache loading
was faster that before). But both tests before and after optimization were done with same
settings. The randseed was identical (0L)

> Cached filter for a single term field
> -------------------------------------
>                 Key: LUCENE-1461
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Tim Sturge
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>         Attachments:, FieldCacheRangeFilter.patch, LUCENE-1461.patch,
LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch,
LUCENE-1461.patch, LUCENE-1461a.patch, LUCENE-1461b.patch, LUCENE-1461c.patch,,,,, TestFieldCacheRangeFilter.patch
> These classes implement inexpensive range filtering over a field containing a single
term. They do this by building an integer array of term numbers (storing the term->number
mapping in a TreeMap) and then implementing a fast integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but could also be used to
do other date filtering or in any application where there need to be multiple filters based
on the same single term field. I have an untested implementation of single term filtering
and have considered but not yet implemented term set filtering (useful for location based
searches) as well. 
> The code here is fairly rough; it works but lacks javadocs and toString() and hashCode()
methods etc. I'm posting it here to discover if there is other interest in this feature; I
don't mind fixing it up but would hate to go to the effort if it's not going to make it into

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message