lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Elschot (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1461) Cached filter for a single term field
Date Thu, 20 Nov 2008 08:06:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649298#action_12649298
] 

Paul Elschot commented on LUCENE-1461:
--------------------------------------

For fields that have no more distinct values than fit into a short (2^16 at best, 65536),
using a short[] would make sense I think. As the number of distinct field values can simply
be counted in this context, it would make sense to simply replace the int[] by a short[] in
that case. But it would only help to reduce space, and only a factor two.

For a set based query, the problem boils down to doing integer set membership in the iterator.
For small sets, binary search should be fine. For larger ones an OpenBitSet would be preferable,
but in this context that would only be feasible when the number of different terms is a lot
smaller than the number of documents in the index.

For location grid-blocks one needs to deal with more than one dimension. In such cases my
first thought is to use indexed hierarchical prefixes in each dimension, because this allows
skipTo() to be used on the documents for the intersection between the dimensions. (But there
may be better ways, it's a long time ago that I had a look at the literature for this.)
Do you need to index separate lower bounds and upper bounds on the data? That would complicate
things.
Without indexed bounds (i.e. point data only) for each dimension it could make sense to use
this multi range filter.



> Cached filter for a single term field
> -------------------------------------
>
>                 Key: LUCENE-1461
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1461
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Tim Sturge
>         Attachments: DisjointMultiFilter.java, RangeMultiFilter.java
>
>
> These classes implement inexpensive range filtering over a field containing a single
term. They do this by building an integer array of term numbers (storing the term->number
mapping in a TreeMap) and then implementing a fast integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but could also be used to
do other date filtering or in any application where there need to be multiple filters based
on the same single term field. I have an untested implementation of single term filtering
and have considered but not yet implemented term set filtering (useful for location based
searches) as well. 
> The code here is fairly rough; it works but lacks javadocs and toString() and hashCode()
methods etc. I'm posting it here to discover if there is other interest in this feature; I
don't mind fixing it up but would hate to go to the effort if it's not going to make it into
Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message