Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 787 invoked from network); 19 Feb 2002 22:45:45 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 19 Feb 2002 22:45:45 -0000 Received: (qmail 8405 invoked by uid 97); 19 Feb 2002 22:45:49 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 8375 invoked by uid 97); 19 Feb 2002 22:45:49 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 8359 invoked from network); 19 Feb 2002 22:45:48 -0000 Message-ID: <94F890AC98E9AF478F08FEFAC7467C7C010F66@riker01> From: Doug Cutting To: 'Lucene Users List' Subject: RE: Searching numerical ranges Date: Tue, 19 Feb 2002 14:27:35 -0800 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N > From: David Elworthy [mailto:dahe@lingomotors.com] > > I want to be able to search on a field which contains a > numerical value, > specifying a range, such as 1-100. If my understanding of Lucene is > correct, all fields look essentially like strings, so a simple ranhe > query won't work (after all, searching on the range "a"-"azz" > should not > match "b"). So my plan is to pad up all numbers to a fixed length by > prefixing them with zeros on both indexing and search, so the > range then becomes (e.g.) 000001-000100. That sounds like a good strategy. > My one worry is that it will upset the rankings, as number which > happened to have occurred in more documents will get a lower IDF, > whereas all number really ought to receive equal treatment. So a > possible refinement is to include the clause for the number in my > overall boolean expression, but give it a boost of zero or some small > number. So it has to match but does not contribute to the relevance That should work. Another alternative is to implement a Filter, which does not affect scoring at all. This is just a bit vector which contains ones for documents which should be included and zeros for others. That's what the date code uses. Doug -- To unsubscribe, e-mail: For additional commands, e-mail: