lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Sturge (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1461) Cached filter for a single term field
Date Wed, 19 Nov 2008 01:27:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648869#action_12648869
] 

Tim Sturge commented on LUCENE-1461:
------------------------------------

Here's some benchmark data to demonstrate the utility. Results on a 45M document index:

Firstly without an age constraint as a baseline:

Query "+name:tim" 
startup: 0 
Hits: 15089
first query: 1004
100 queries: 132 (1.32 msec per query)

Now with a cached filter. This is ideal from a speed standpoint but as with most range based
queries there are too many possible start/end combinations to cache all the filters.

Query "+name:tim age:[18 TO 35]" (ConstantScoreQuery on cached RangeFilter)
startup: 3
Hits: 11156
first query: 1830
100 queries: 287 (2.87 msec per query)

Now with an uncached filter. This is awful.

Query "+name:tim age:[18 TO 35]" (uncached ConstantScoreRangeQuery)
startup: 3
Hits: 11156
first query: 1665
100 queries: 51862 (yes, 518 msec per query, 200x slower)

A RangeQuery is slightly better but still bad (and has a different result set)

Query "+name:tim age:[18 TO 35]" (uncached RangeQuery)
startup: 0
Hits: 10147
first query: 1517
100 queries: 27157 (271 msec is 100x slower than the filter)

Now with the prebuilt column stride filter:

Query "+name:tim age:[18 TO 35]" (ConstantScoreQuery on prebuilt column stride filter)
startup: 2811
Hits: 11156
first query: 1395
100 queries: 441 (back down to 4.41msec per query)

This is less than 2x slower than the dedicated bitset and more than 50x faster than the range
boolean query.



> Cached filter for a single term field
> -------------------------------------
>
>                 Key: LUCENE-1461
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1461
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Tim Sturge
>         Attachments: DisjointMultiFilter.java, RangeMultiFilter.java
>
>
> These classes implement inexpensive range filtering over a field containing a single
term. They do this by building an integer array of term numbers (storing the term->number
mapping in a TreeMap) and then implementing a fast integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but could also be used to
do other date filtering or in any application where there need to be multiple filters based
on the same single term field. I have an untested implementation of single term filtering
and have considered but not yet implemented term set filtering (useful for location based
searches) as well. 
> The code here is fairly rough; it works but lacks javadocs and toString() and hashCode()
methods etc. I'm posting it here to discover if there is other interest in this feature; I
don't mind fixing it up but would hate to go to the effort if it's not going to make it into
Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message