lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Earwin Burrfoot (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1470) Add TrieRangeQuery to contrib
Date Wed, 26 Nov 2008 16:48:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651056#action_12651056
] 

Earwin Burrfoot commented on LUCENE-1470:
-----------------------------------------

bq. in base 2^15, you only have 4 precisions and some more bits
We have slightly different approaches. Yours is universal, and mine requires hand-tuning.
I have an abstract FastRangeFilter, which I extend, specifying functions that lower precision
before encoding, thus I have any required amount of type/field-dependent precision levels.
For further ease of use that one is extended by FastDateRangeFilter, which accepts an array
of date parts, like {HOUR_OF_DAY, DAY_OF_MONTH}.
That allows me to exploit known statistical properties of my requests/data, for example most
date ranges are rolling day/week/month/3 months windows, or salaries which tend to be attracted
to certain values.

bq. Java sometimes has strange string comparisons, and I did not want to walk into incompatiblities
with String.compareTo()
Fact that java strings are UCS-16 (UTF-16 minus special non-16bit characters) is written into
java language specification, so you can trust String.compareTo() - anything that blows up
there, is not Java(tm). Problems usually come from within libraries.

bq. I did not try to sort the results to my combined, prefixed field since long time, and
you are right, it is not possible, if all different precisions are in the same field.
Actually, right now I'm -stealing- borrowing your idea of storing various precisions in the
same field. I have my own custom field cache, so broken sort can be fixed easily. Having everything
in the same field allows you to do exactly one pass through TermEnum/TermDocs, and that is
what I'm going to exploit :)

> Add TrieRangeQuery to contrib
> -----------------------------
>
>                 Key: LUCENE-1470
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1470
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 2.4
>            Reporter: Uwe Schindler
>         Attachments: LUCENE-1470.patch
>
>
> According to the thread in java-dev (http://www.gossamer-threads.com/lists/lucene/java-dev/67807
and http://www.gossamer-threads.com/lists/lucene/java-dev/67839), I want to include my fast
numerical range query implementation into lucene contrib-queries.
> I implemented (based on RangeFilter) another approach for faster
> RangeQueries, based on longs stored in index in a special format.
> The idea behind this is to store the longs in different precision in index
> and partition the query range in such a way, that the outer boundaries are
> search using terms from the highest precision, but the center of the search
> Range with lower precision. The implementation stores the longs in 8
> different precisions (using a class called TrieUtils). It also has support
> for Doubles, using the IEEE 754 floating-point "double format" bit layout
> with some bit mappings to make them binary sortable. The approach is used in
> rather big indexes, query times are even on low performance desktop
> computers <<100 ms (!) for very big ranges on indexes with 500000 docs.
> I called this RangeQuery variant and format "TrieRangeRange" query because
> the idea looks like the well-known Trie structures (but it is not identical
> to real tries, but algorithms are related to it).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message