lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "alsadi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3747) Solr Score threshold 'reasonably', independent of results returned
Date Wed, 22 Aug 2012 10:00:38 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439388#comment-13439388
] 

alsadi commented on SOLR-3747:
------------------------------

Yes, Eric I agree with you that score is not a percent and if result #1 is twice as of #2
this does not mean it's double relevant. But the need for a threshold is valid.

We always want to decrease noise to signal ratio, when we have exact matches we don't want
to show the rest of noise results.

We know that when result #1 is more relevant than #2 then its score is bigger, should the
threshold be based on percent or on log or atan of the score is a matter of tuning.

what worked for me is stop when the score of the current result is less than 0.25 of the score
of previous result.
where 0.25 is just fudge factor.
and since we compare a result with previous result a score that smoothly fades will continue
to fade for ever, while when we have a sheer drop of score it will stop.

                
> Solr Score threshold 'reasonably', independent of results returned
> ------------------------------------------------------------------
>
>                 Key: SOLR-3747
>                 URL: https://issues.apache.org/jira/browse/SOLR-3747
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>            Reporter: Ramzi Alqrainy
>              Labels: documentation
>         Attachments: Screen Shot 2012-08-21 at 5.30.38 AM.png
>
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> Usually, search results are sorted by their score (how well the document matched the
query), but it is common to need to support the sorting of supplied data too.
> Boosting affects the scores of matching documents in order to affect ranking in score-sorted
search results. Providing a boost value, whether at the document or field level, is optional.

> When the results are returned with scores, we want to be able to only "keep" results
that are above some score (i.e. results of a certain quality only). Is it possible to do this
when the returned subset could be anything?
> I ask because it seems like on some queries a score of say 0.008 is resulting in a decent
match, whereas other queries a higher score results in a poor match.
> I have written pseudo code to achieve what I said.
> Note: I have attached my code as screenshot
>       double scoreLimit = 0.75  #For example
>       searchResults = new Results[numberOfResults];
>       boolean lastScore=false;
>       solrSearchResults = Calling Solr Engine.
>       for( Result result : solrSearchResults) {
>         if (lastScore != false && result.score/lastScore<scoreLimit) break;
>         	lastScore = result.score;
>             Adding new result to searchResults array      					
>       }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message