lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities
Date Fri, 05 Aug 2011 13:19:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079961#comment-13079961
] 

Robert Muir commented on LUCENE-3220:
-------------------------------------

Thanks, I committed your latest patch, some ideas just perusing:
* we can move the calculations currently in decodeNormValue into the static table, this way
we aren't doing these per-document multiplications and divisions... so decodeNormValue just
returns the document length.
* should easysim change its score method from score(Stats stats, float freq, byte norm) to
score(Stats stats, float freq, int documentLength) ? then it could encapsulate this encoding/decoding.
* I think we should try to factor in the index-time boost in computeNorm here if we can, e.g.
just divide the document length by it? So documents with a higher index-time boost have a
shorter length.

> Implement various ranking models as Similarities
> ------------------------------------------------
>
>                 Key: LUCENE-3220
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3220
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: core/query/scoring, core/search
>    Affects Versions: flexscoring branch
>            Reporter: David Mark Nemeskey
>            Assignee: David Mark Nemeskey
>              Labels: gsoc, gsoc2011
>         Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch,
LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch,
LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch,
LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch,
LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally
work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu.
> Done:
>  * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers
and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
>  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message