lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 36135] - Numeric range searching with large value sets
Date Thu, 11 Aug 2005 17:39:53 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=36135>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=36135





------- Additional Comments From randy@zillow.com  2005-08-11 19:39 -------
(In reply to comment #14)
> Randy, I haven't looked at the source yet, but is the aim to allow for queries
> such as "give me documents that have a price in a range from 0.99 to 9.99"?
> 
> From the sound of this, this looks useful.
> I hardly ever use range queries, so could you please explain what you mean by
> "... searches over numeric ranges that are far too large to be implemented via
> the current term range rewrite mechanism"?
> 
> How does the current implementation deal with a large numeric range, and how
> does your contribution fix it?
> 
> Thanks.
> 


Yes, this change will efficiently implement floating point ranges like .99 to 9.99.

The current rewrite scans terms (therefore lexicographically) from the beginning
of the range and generates a boolean query that is the disjunction of all the
terms falling in the lexicographic range.  Any large range throws
TooManyClauses.  You'll note in an earlier comment that 34673 is a contribution
that fixes this case essentially by turning the query into a filter.

My submission catches the TooManyExceptions exception within the rewrite method
of RangeQuery then attempts to construct either an IntegerRangeQuery or
FloatRangeQuery (this permits you to control the mechanism by manipulating the
maximum boolean clauses).  Both of these work by:
1) pulling field values for all docnos (as in the numeric sorting solution)
2) creating an array of docnos then sorting that array relative to the field values
3) at query time, the bounds are found using (indirect) binary search on the
docnos to find the set of docnos that match the range
4) the query sorts this set of docnos back into docno order and returns that set

Range sorting is useful for e.g. prices, heights etc. etc.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message