lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8072) Improve accuracy of similarity scores
Date Fri, 01 Dec 2017 15:06:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274493#comment-16274493
] 

Robert Muir commented on LUCENE-8072:
-------------------------------------

As far as changes to double precision, we should be careful here too. Really the test improvements
for LUCENE-8015 needs to be applied before we make any alterations for formulas because the
current tests are too inefficient.

Similarity has to deal with crazy values for a variety of reasons in lucene and our first
challenge is to get all of our scoring behaving properly with monotonicity we need for optimizations.
Extra precision in various places may or may not help that, anyway lets avoid playing whack-a-mole
:)

> Improve accuracy of similarity scores
> -------------------------------------
>
>                 Key: LUCENE-8072
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8072
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-8072.patch
>
>
> I noticed two things we could do to improve the accuracy of our scores:
>  - use {{Math.log1p(x)}} instead of {{Math.log(1+x)}}, especially when x is expected
to be small
>  - use doubles for intermediate values that are used to compute norms in BM25Similarity



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message