lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Base score to use for custom query?
Date Tue, 27 Apr 2010 18:14:00 GMT

First off: if you haven't already make sure you OMIT_NORMS when indexing 
this field, that way you don't have to worry about docs with "lots" of 
numbers scoring low purely because of hte fieldNorm.

Second: i wouldn't bother with a custom query, i would stick with your 
BooleanQuery appraoch, but make sure you do two things:

1) add boosts to all of your TermQueries a boost based on how far they are 
from the end of hte range. so if you have a rangle like [10 19] give the 
19 clauses a boost of 1, the 18 clause a boost of 2, the 17 clause a boost 
of 3, etc...

2) disable the coord.  there is an option on BooleanQuery to do this, and 
it will make sure docs that only match one clause in your BooleanQuery 
dont' get a penalty compared to clauses that match many clauses in your 
BooleanQuery -- which is going to be important in ensuring that your 
boosts are useful.

That should get you what you want, and if not then take a look at the 
score explaiantions and see if anything obvious jumps out -- post a 
followup with your code and the score explanations if you can't solve it 
to your liking.

: I have a field in my document that contains a range of numbers. Say, for
: example, the universe of numbers is the range of integers from 0-100. My
: field represents a subrange of those numbers in a token stream. So, for
: example, if one document contains 20-30, it's token stream contains the
: terms [20, 21, 22, ..., 29]. Now I can quickly find all documents that
: contain some number.
: 
: The next part of the problem is searching for all documents that intersect
: with some subrange of numbers. Somewhat like a range query, but not exactly.
: Say I want to search for all documents that touch the range [10, 30]. My
: original implementation was to simply create a BooleanQuery full of
: TermQuerys for each term in the range i was searching for. While this
: returned the proper results, it did so with skewed scores. I'd prefer
: documents containing numbers towards the beginning of my search range to be
: scored higher than docs towards the end. So, if I had two documents, one
: with 10-20, and one with 20-30, and I searched for [19,30], both documents
: would be returned, but the second would be much more highly scored due to
: its higher number of matched terms.
: 
: So, my plan is to write a custom query which matches documents documents in
: my range in a way such as:


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message