lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Halbert <j...@su3analytics.com>
Subject relevance function for scores
Date Mon, 18 May 2009 12:52:41 GMT
Hi,

I'd like to apply a score filter. I realise that filtering by absolute
(i.e. anything less than x) scores is pretty meaningless.

In my case I want to filter based on relative score - or on some
function of score which looks for clustering of documents around certain
score values.

Context: I have set up field boosts such that a query hit on one indexed
field will, in theory, result in a score one or more order of magnitudes
greater than a hit on some other field. So if I have 2 fields A and B
and I'm really really interested in hits on A, and only interested in
hits on B if there were none on A,  I boost A by 1000, relative to B.
The resultant score should reflect this.

The ability to do this becomes important when we want to re-order the
search results around some other field (not score) and are not
interested in displaying the least relevant documents.


It is an easy thing to write a basic 'document collector/result filter'
that uses relative score information to filter out documents where any
score is less than some magnitude of the best score, but I'm sure this
could be more elegantly generalised into some mathematical
"relevance/significance" model/function  which could determine some
optimal cutoff for documents based on the clustering of results around
scores. 
e.g. if my top 5 documents are all between score 0.9 and 0.7 and the
remaining 10 are less than 0.01 then we could sensibly take the top 5
docs as most relevant. 

Has anyone experience of doing such a thing?


Regards,
Joel



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message