lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] Commented: (LUCENE-1461) Cached filter for a single term field
Date Fri, 26 Jun 2009 10:41:07 GMT


Uwe Schindler commented on LUCENE-1461:

I did some performance tests and compared this filter with TrieRange (precStep 8) on an 5
Mio index with homegenous distributed int values from Integer.MIN_VALUE to Integer.MAX_VALUE
and 200 queries with random bounds in same range. Platform was Win32 with 1.5 GIG RAM on my
Thinkpad T60 Core Duo (not 2 Duo!), Java 1.5:

loading field cache
time: 11826.602264 ms
Warming searcher...
avg number of terms: 414.365
TRIE: best time=4.51482 ms; worst time=1560.544985 ms; avg=470.56886981499997 ms; sum=323328111
FIELDCACHE: best time=314.611773 ms; worst time=878.438461 ms; avg=511.93189495499996 ms;

This test shows, that with a good warmed searcher and the whole index in OS cache is the same
in speed. A constant score convential range query is far out (about 10 to 1000 times slower
dependent on how far the random range bounds are away).

The same with the old patch (using no TermDocs) and a completely separate loop (not matchDoc()
method call), the FieldCache filter only hits the trie filter here:

loading field cache
time: 12134.143027 ms
Warming searcher...
avg number of terms: 403.785
TRIE: best time=3.890159 ms; worst time=1266.979462 ms; avg=453.553236545ms; sum=308154314
FIELDCACHE: best time=84.019897 ms; worst time=434.558023 ms; avg=235.91554798500002 ms; sum=308154314

Both test runs show, that the queries work correct (sum is identical, it shows that both returned
exact the same hits).

In all cases I would still prefer TrieRange (hihi), especially because of the long warming
time for the field cache. And TrieRange gets even better with lower precSteps, but not really
(in constant score mode the bits sets are the bigger problem)

> Cached filter for a single term field
> -------------------------------------
>                 Key: LUCENE-1461
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Tim Sturge
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>         Attachments:, FieldCacheRangeFilter.patch, LUCENE-1461.patch,
LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461a.patch,
LUCENE-1461b.patch, LUCENE-1461c.patch,,,,
> These classes implement inexpensive range filtering over a field containing a single
term. They do this by building an integer array of term numbers (storing the term->number
mapping in a TreeMap) and then implementing a fast integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but could also be used to
do other date filtering or in any application where there need to be multiple filters based
on the same single term field. I have an untested implementation of single term filtering
and have considered but not yet implemented term set filtering (useful for location based
searches) as well. 
> The code here is fairly rough; it works but lacks javadocs and toString() and hashCode()
methods etc. I'm posting it here to discover if there is other interest in this feature; I
don't mind fixing it up but would hate to go to the effort if it's not going to make it into

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message