lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities
Date Tue, 02 Aug 2011 12:43:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13076171#comment-13076171
] 

Robert Muir commented on LUCENE-3220:
-------------------------------------

Hi David, i was thinking for the norm, we could store it like DefaultSimilarity. this would
make it especially convenient, as you could easily use these similarities with the same exact
index as one using Lucene's default scoring. Also I think (not sure!) by using 1/sqrt we will
get better quantization from smallfloat?

{noformat}
  public byte computeNorm(FieldInvertState state) {
    final int numTerms;
    if (discountOverlaps)
      numTerms = state.getLength() - state.getNumOverlap();
    else
      numTerms = state.getLength();
    return encodeNormValue(state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms))));
  }
{noformat}

for computations, you have to 'undo' the sqrt() to get the quantized length, but thats ok
since its only done up-front a single time and tableized, so it won't slow anything down.


> Implement various ranking models as Similarities
> ------------------------------------------------
>
>                 Key: LUCENE-3220
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3220
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: core/query/scoring, core/search
>    Affects Versions: flexscoring branch
>            Reporter: David Mark Nemeskey
>            Assignee: David Mark Nemeskey
>              Labels: gsoc, gsoc2011
>         Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch,
LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch,
LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch,
LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally
work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu.
> Done:
>  * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers
and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
>  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message