lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (LUCENE-1470) Add TrieRangeQuery to contrib
Date Sat, 07 Feb 2009 17:56:59 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671495#action_12671495
] 

yseeley@gmail.com edited comment on LUCENE-1470 at 2/7/09 9:56 AM:
--------------------------------------------------------------

Attaching completely untested prototype TrueUtils.java

some discussion:
http://www.lucidimagination.com/search/document/d62c0fd21d88f880

Features:
  - same encode/decode code works for any variant... no 2,4,8 bit specific instances
  - decouples "slicing" of the value into different precisions and encoding of the slice to
a String, allowing for the most efficient String encoding to be used for every prevision variant.
  - 7 bit char encoding to optimize for UTF8 index storage
  - right justified to allow lucene to prefix compress efficiently
  - separates creation of sortableBits from trie encoding of those bits to avoid so many methods
  - allows indexing into multiple fields, or all in the same field
  - much smaller code should be much easier to understand
  - left out "Date" support - the average Java developer understands how to go from a Date
to a long (unlike double, etc).
  - relatively trivial to add 32 bit (int/float) support and reuse code like addIndexedFields
(which is just an agnostic helper method).


      was (Author: yseeley@gmail.com):
    Attaching completely untested prototype TrueUtils.java

some discussion:
http://www.lucidimagination.com/search/document/d62c0fd21d88f880

Features:
  - same encode/decode code works for any variant... no 2,4,8 bit specific instances
  - 7 bit encoding to optimize for UTF8 index storage
  - right justified to allow lucene to prefix compress efficiently
  - separates creation of sortableBits from trie encoding of those bits to avoid so many methods
  - allows indexing into multiple fields, or all in the same field
  - much smaller code should be much easier to understand
  - left out "Date" support - the average Java developer understands how to go from a Date
to a long (unlike double, etc).
  - relatively trivial to add 32 bit (int/float) support and reuse code like addIndexedFields
(which is just an agnostic helper method).

  
> Add TrieRangeQuery to contrib
> -----------------------------
>
>                 Key: LUCENE-1470
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1470
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 2.4
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: fixbuild-LUCENE-1470.patch, fixbuild-LUCENE-1470.patch, LUCENE-1470-readme.patch,
LUCENE-1470.patch, LUCENE-1470.patch, LUCENE-1470.patch, LUCENE-1470.patch, LUCENE-1470.patch,
LUCENE-1470.patch, LUCENE-1470.patch, TrieUtils.java
>
>
> According to the thread in java-dev (http://www.gossamer-threads.com/lists/lucene/java-dev/67807
and http://www.gossamer-threads.com/lists/lucene/java-dev/67839), I want to include my fast
numerical range query implementation into lucene contrib-queries.
> I implemented (based on RangeFilter) another approach for faster
> RangeQueries, based on longs stored in index in a special format.
> The idea behind this is to store the longs in different precision in index
> and partition the query range in such a way, that the outer boundaries are
> search using terms from the highest precision, but the center of the search
> Range with lower precision. The implementation stores the longs in 8
> different precisions (using a class called TrieUtils). It also has support
> for Doubles, using the IEEE 754 floating-point "double format" bit layout
> with some bit mappings to make them binary sortable. The approach is used in
> rather big indexes, query times are even on low performance desktop
> computers <<100 ms (!) for very big ranges on indexes with 500000 docs.
> I called this RangeQuery variant and format "TrieRangeRange" query because
> the idea looks like the well-known Trie structures (but it is not identical
> to real tries, but algorithms are related to it).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message