lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: [jira] [Commented] (SOLR-3747) Solr Score threshold 'reasonably', independent of results returned
Date Wed, 22 Aug 2012 17:08:45 GMT
What I've seen done in the past that _might_ work is, rather than
some arbitrary threshold applied across different search results, use
a delta between two successive scores as your cutoff. For instance,
say you had three documents with scores like:

doc 1 - .90
doc 2 - .80
doc 3 - .20

The delta between 2 and 3 is 75% of the score for doc 2, so you
could choose that as your cutoff point. So your actual calculation
is something like
(docN_score - docN+1_score)/docN_score > 0.25
where .75 is whatever percentage you decide is "too big a gap".

This avoids the problem of cross-query scores not being comparable,
and really implements something like "when the score drops lots,
stop showing the results".

Or you could do something like that above, but instead of docN, always
use the score of the first doc. Or...

This will probably fall down when the doc scores are all small, it might
not really say much if the relevance of doc 1 is 0.02 and doc2 is 0.002.
The delta in terms of percent is large, but I'm not sure it really means
anything.

FWIW,
Erick

On Wed, Aug 22, 2012 at 6:00 AM, alsadi (JIRA) <jira@apache.org> wrote:
>
>     [ https://issues.apache.org/jira/browse/SOLR-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439388#comment-13439388
]
>
> alsadi commented on SOLR-3747:
> ------------------------------
>
> Yes, Eric I agree with you that score is not a percent and if result #1 is twice as of
#2 this does not mean it's double relevant. But the need for a threshold is valid.
>
> We always want to decrease noise to signal ratio, when we have exact matches we don't
want to show the rest of noise results.
>
> We know that when result #1 is more relevant than #2 then its score is bigger, should
the threshold be based on percent or on log or atan of the score is a matter of tuning.
>
> what worked for me is stop when the score of the current result is less than 0.25 of
the score of previous result.
> where 0.25 is just fudge factor.
> and since we compare a result with previous result a score that smoothly fades will continue
to fade for ever, while when we have a sheer drop of score it will stop.
>
>
>> Solr Score threshold 'reasonably', independent of results returned
>> ------------------------------------------------------------------
>>
>>                 Key: SOLR-3747
>>                 URL: https://issues.apache.org/jira/browse/SOLR-3747
>>             Project: Solr
>>          Issue Type: Task
>>          Components: Schema and Analysis
>>            Reporter: Ramzi Alqrainy
>>              Labels: documentation
>>         Attachments: Screen Shot 2012-08-21 at 5.30.38 AM.png
>>
>>   Original Estimate: 5h
>>  Remaining Estimate: 5h
>>
>> Usually, search results are sorted by their score (how well the document matched
the query), but it is common to need to support the sorting of supplied data too.
>> Boosting affects the scores of matching documents in order to affect ranking in score-sorted
search results. Providing a boost value, whether at the document or field level, is optional.
>> When the results are returned with scores, we want to be able to only "keep" results
that are above some score (i.e. results of a certain quality only). Is it possible to do this
when the returned subset could be anything?
>> I ask because it seems like on some queries a score of say 0.008 is resulting in
a decent match, whereas other queries a higher score results in a poor match.
>> I have written pseudo code to achieve what I said.
>> Note: I have attached my code as screenshot
>>       double scoreLimit = 0.75  #For example
>>       searchResults = new Results[numberOfResults];
>>       boolean lastScore=false;
>>       solrSearchResults = Calling Solr Engine.
>>       for( Result result : solrSearchResults) {
>>         if (lastScore != false && result.score/lastScore<scoreLimit) break;
>>               lastScore = result.score;
>>             Adding new result to searchResults array
>>       }
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message