lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom Burton-West (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)
Date Tue, 13 Aug 2013 21:00:48 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738818#comment-13738818
] 

Tom Burton-West commented on LUCENE-5175:
-----------------------------------------

I wondered about that "crazy cache", in that it makes the implementation dependent on the
norms implementation.  

BTW: It looks to me with Lucene's default norms that there are only about 130 or so "document
lengths".  If there is no boosting going on the byte value has to get to 124 for a doclenth
= 1, so there are only 255-124 =131 possible different lengths.

i=124 norm=1.0,doclen=1.0
                
> Add parameter to lower-bound TF normalization for BM25 (for long documents)
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-5175
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5175
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Tom Burton-West
>            Priority: Minor
>         Attachments: LUCENE-5175.patch
>
>
> In the article "When Documents Are Very Long, BM25 Fails!" a fix for the problem is documented.
 There was a TODO note in BM25Similarity to add this fix. I will attach a patch that implements
the fix shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message