lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-java Wiki] Update of "SearchNumericalFields" by UweSchindler
Date Sun, 25 Jan 2009 21:02:37 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The following page has been changed by UweSchindler:
http://wiki.apache.org/lucene-java/SearchNumericalFields

The comment on the change is:
update hudson link

------------------------------------------------------------------------------
  
  == TrieRangeQuery (in contrib/search since version 2.9-dev, which is not yet released) ==
  
- Because Apache Lucene is a full-text search engine and not a conventional database, it cannot
handle numerical ranges (e.g., field value is inside user defined bounds, even dates are numerical
values). A contrib extension was developed, that stores the numerical values in a special
string-encoded format with variable precision (all numerical values like doubles, longs, and
timestamps are converted to lexicographic sortable string representations and stored with
different precisions from one byte to the full 8 bytes - depending on the variant used). A
range is then divided recursively into multiple intervals for searching: The center of the
range is searched only with the lowest possible precision in the trie, the boundaries are
matched more exactly. This reduces the number of terms dramatically. See: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/search/trie/package-summary.html
+ Because Apache Lucene is a full-text search engine and not a conventional database, it cannot
handle numerical ranges (e.g., field value is inside user defined bounds, even dates are numerical
values). A contrib extension was developed, that stores the numerical values in a special
string-encoded format with variable precision (all numerical values like doubles, longs, and
timestamps are converted to lexicographic sortable string representations and stored with
different precisions from one byte to the full 8 bytes - depending on the variant used). A
range is then divided recursively into multiple intervals for searching: The center of the
range is searched only with the lowest possible precision in the trie, the boundaries are
matched more exactly. This reduces the number of terms dramatically. See: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/all/org/apache/lucene/search/trie/package-summary.html
  
  This dramatically improves the performance of Apache Lucene with range queries, which is
no longer dependent on the index size and number of distinct values because there is an upper
limit not related to any of these properties.
  

Mime
View raw message