lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1908) Similarity javadocs for scoring function to relate more tightly to scoring models in effect
Date Sat, 12 Sep 2009 20:56:57 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754610#action_12754610
] 

Doron Cohen commented on LUCENE-1908:
-------------------------------------

{quote}
I'm still a little confused I guess 
{quote}

That makes too of us... :)

{quote}
The longer docs will have larger weights naturally is what I meant, but larger weights actually
hurts in the cosine normalization - so it actually over punishes I guess? I dunno - all of
this over punish/ under punish is in comparison to a relevancy curve they figure out ( a probability
of relevance as a function of document length), and how the pivoted cosine curves compare
against it. I'm just reading across random interweb pdfs from google. Apparently our pivot
also over punishes large docs and over favors small, the same as this one, but perhaps not
as bad ? I'm seeing that in a Lucene/Juru research pdf. This stuff is hard to grok on first
pass.
{quote}

In that work we got best results from Lucene (for TREC) with SweetSpot similarity and normalizing
tf by average term-freq in doc.

For me it was mainly experimental and intuitive, but I was not able to convince Hoss (or even
convince myslf once I read Hoss comments) that this was justified theoretically. 

At that time I was not aware of the V(d) normalization delicacy we are discussing now. I think
I understand things better now, and still I am not sure. Need to read http://nlp.stanford.edu/IR-book/html/htmledition/pivoted-normalized-document-length-1.html
and sleep on it... 

> Similarity javadocs for scoring function to relate more tightly to scoring models in
effect
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1908
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1908
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1908.patch, LUCENE-1908.patch
>
>
> See discussion in the related issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message