lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-java Wiki] Update of "SearchNumericalFields" by UweSchindler
Date Sat, 27 Jun 2009 16:31:03 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The following page has been changed by UweSchindler:
http://wiki.apache.org/lucene-java/SearchNumericalFields

The comment on the change is:
change from TrieRange to NumericRange

------------------------------------------------------------------------------
  = Searching Numerical Fields =
  
- == TrieRangeQuery (in contrib/search since version 2.9-dev, which is not yet released) ==
+ == NumericRangeQuery (in Lucene Core since version 2.9-dev, which is not yet released) ==
  
- Because Apache Lucene is a full-text search engine and not a conventional database, it cannot
handle numerical ranges (e.g., field value is inside user defined bounds, even dates are numerical
values). We have developed an extension to Apache Lucene that stores the numerical values
in a special string-encoded format with variable precision (all numerical values like doubles,
longs, Dates, floats, and ints are converted to lexicographic sortable string representations
and stored with different precisions. For a more detailed description of how the values are
stored, see TrieUtils. A range is then divided recursively into multiple intervals for searching:
The center of the range is searched only with the lowest possible precision in the trie, while
the boundaries are matched more exactly. This reduces the number of terms dramatically. See:
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/all/org/apache/lucene/search/trie/package-summary.html
+ Because Apache Lucene is a full-text search engine and not a conventional database, it cannot
handle numerical ranges (e.g., field value is inside user defined bounds, even dates are numerical
values). We have developed an extension to Apache Lucene that stores the numerical values
in a special string-encoded format with variable precision (called trie, all numerical values
like doubles, longs, Dates, floats, and ints are converted to lexicographic sortable string
representations and stored with different precisions). A range is then divided recursively
into multiple intervals for searching: The center of the range is searched only with the lowest
possible precision in the trie, while the boundaries are matched more exactly. This reduces
the number of terms dramatically. See: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/core/org/apache/lucene/search/NumericRangeQuery.html
  
  This dramatically improves the performance of Apache Lucene with range queries, which is
no longer dependent on the index size and number of distinct values because there is an upper
limit not related to any of these properties.
  
- Trie''''''Range''''''Query can be used for date/time searches (if you need variable precision
of date and time downto milliseconds), double searches (e.g. spatial search for latitudes
or longitudes), prices (if encoded as long using cent values, doubles are not good for price
values because of rounding problems). The document fields containing the trie encoded values
are generated by the Trie''''''Utils class. The values can also be stored in index using the
trie encoding, for displaying they can be converted back to the primitive types. Trie''''''Utils
also supplies a factory for Sort''''''Field instances on trie encoded fields that automatically
uses an Extended''''''Field''''''Cache.Long''''''Parser or ''''''Field''''''Cache.Int''''''Parser
for efficient sorting of the primitive types.
+ Numeric''''''Range''''''Query (formerly Trie''''''Range''''''Query) can be used for date/time
searches (if you need variable precision of date and time downto milliseconds), double searches
(e.g. spatial search for latitudes or longitudes), prices (if encoded as long using cent values,
doubles are not good for price values because of rounding problems). The document fields containing
the trie encoded values are generated by a special Numeric''''''Token''''''Stream or simplier
using the new field implementation Numeric''''''Field (see http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/core/org/apache/lucene/document/NumericField.html).
Numeric fields can be sorted on (a special parser is included into Field''''''Cache) and used
in function queries (through Field''''''Cache)
  
  == Other possibilities with storing numerical values stored in more readable form in index
==
  

Mime
View raw message