lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Earwin Burrfoot (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1461) Cached filter for a single term field
Date Wed, 26 Nov 2008 12:04:47 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650974#action_12650974
] 

Earwin Burrfoot commented on LUCENE-1461:
-----------------------------------------

bq. RangeQuery no longer relies on the sort order of the terms, which means tricks like padding
numeric terms are no longer needed, I think?
I do rely on sort order for speed and simplicity, though I never used padding for numeric/date
terms :) All dates/numbers/somethingelsespecial are converted to strings using base-2^15^
(to keep high bit=0, as 0xFFFF is used somewhere within Lucene intestines as EOS marker, darn
it!) encoding. Plus adjustment to preserve sort order for negative numbers in face of unsigned
java char. This transformation is insanely fast, and produces well-compressed results (I have
FAT read->mem/write->mem+disk indexes).

bq. b) prefix the terms with a precision marker. The prefix is important for the sort order,
so that all terms of one precision are in one "bunch" and not distributed between higher precsion
terms.
And you can no longer use this field for sorting, as it has more than one term for each document.

bq. For my last implementation, based on filters I did not use a BooleanQuery with OR'ed ranges
because of resource usage
Using filters here too

bq. Allowing each field to provide its own Comparator may still be helpful then
But you still store strings in the index. So essentially you'll convert your value from T
to String, store it, retrieve it, convert back to T in such a custom comparator, and finally
compare. Why should I need that second conversion and custom comparators, if I can have order-preserving
bijective T<->String relation?



> Cached filter for a single term field
> -------------------------------------
>
>                 Key: LUCENE-1461
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1461
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>         Attachments: DisjointMultiFilter.java, LUCENE-1461.patch, LUCENE-1461a.patch,
LUCENE-1461b.patch, RangeMultiFilter.java, RangeMultiFilter.java, TermMultiFilter.java
>
>
> These classes implement inexpensive range filtering over a field containing a single
term. They do this by building an integer array of term numbers (storing the term->number
mapping in a TreeMap) and then implementing a fast integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but could also be used to
do other date filtering or in any application where there need to be multiple filters based
on the same single term field. I have an untested implementation of single term filtering
and have considered but not yet implemented term set filtering (useful for location based
searches) as well. 
> The code here is fairly rough; it works but lacks javadocs and toString() and hashCode()
methods etc. I'm posting it here to discover if there is other interest in this feature; I
don't mind fixing it up but would hate to go to the effort if it's not going to make it into
Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message