lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Filtering results by minimum relevancy score
Date Wed, 12 Apr 2017 15:28:07 GMT
Well, just because ES has it doesn't mean it's A Good Thing. IMO, it's
just a "feel good" kind of thing for people who don't really
understand scoring.

>From that page: "Note, most times, this does not make much sense, but
is provided for advanced use cases."

I've written enough weasel-worded caveats to read the hidden message
here (freely translated and purged of expletives):

"OK, if you insist we'll provide this, and we'll make you feel good by
saying it's for 'advanced use cases". We don't expect this to be
useful at all, but it's easy to do and we'll waste more time arguing
than just putting it in. P.S. don't call us when you find out this is
useless".

Best,
Erick

On Wed, Apr 12, 2017 at 7:37 AM, Shawn Heisey <apache@elyograg.org> wrote:
> On 4/10/2017 8:59 AM, David Kramer wrote:
>> I’ve done quite a bit of searching on this. Pretty much every page I
>> find says it’s a bad idea and won’t work well, but I’ve been asked to
>> at least try it to reduce the number of completely unrelated results
>> returned. We are not trying to normalize the number, or display it as
>> a percentage, and I understand why those are not mathematically sound.
>> We are relying on Solr for pagination, so we can’t just filter out low
>> scores from the results.
>
> Here's my contribution.  This boils down to nearly the same thing Erick
> said, but stated in a very different way: The absolute score value has
> zero meaning, for ANY purpose ... not just percentages or
> normalization.  If you try to use it, you're asking for disappointment.
>
> Scores only have meaning within a single query, and the only information
> that's important is whether the score of one document is higher or lower
> than the score of the rest of the documents in the same result.
> Boosting lets you influence those relative scores, but the actual
> numeric score of one document in a result doesn't reveal ANYTHING useful
> about that document.
>
> I agree with Erick's general advice:  Instead of trying to arbitrarily
> decide which documents are scoring too low to be relevant, refine the
> query so that irrelevant results are either completely excluded, or so
> relevant documents will outscore irrelevant ones and the first few pages
> will be good results.  Users must be trained to expect irrelevant (and
> slow) results if they paginate deeply.  For performance reasons, you
> should limit how many pages users can view on a result.
>
> Thanks,
> Shawn
>

Mime
View raw message