hivemall-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Takuya Kitazawa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVEMALL-124) NDCG - BinaryResponseMeasure "fix"
Date Tue, 12 Sep 2017 09:41:00 GMT

    [ https://issues.apache.org/jira/browse/HIVEMALL-124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162735#comment-16162735
] 

Takuya Kitazawa commented on HIVEMALL-124:
------------------------------------------

To the best of my understanding, I cannot figure out what [~uhyonc] concerned; even if {{groundTruth}}
contains more than {{recommendSize}} items, there is no problem as the following line appropriately
handles:

{code:java}
double idcg = IDCG(Math.min(recommendSize, groundTruth.size()));
{code}

Not only that, revised code suggested by [~uhyonc] seems to be incorrect according to the
definition of NDCG. That is, IDCG actually focuses on a case that "all {{groundTruth}} items
are placed at head of {{rankedList}}," but the suggested code considers a situation that "all
*True Positive* items are placed at head of {{rankedList}}."

> NDCG - BinaryResponseMeasure "fix"
> ----------------------------------
>
>                 Key: HIVEMALL-124
>                 URL: https://issues.apache.org/jira/browse/HIVEMALL-124
>             Project: Hivemall
>          Issue Type: Improvement
>            Reporter: Uhyon Chung
>            Assignee: Takuya Kitazawa
>
> There's a small issue which makes it a bit hard to use the NDCG@x
> from BinaryResponseMeasure.java
> {code:java}
>     public static double nDCG(@Nonnull final List<?> rankedList,
>             @Nonnull final List<?> groundTruth, @Nonnull final int recommendSize)
{
>         double dcg = 0.d;
>         double idcg = IDCG(Math.min(recommendSize, groundTruth.size()));
> ...
>     public static double IDCG(final int n) {
>         double idcg = 0.d;
>         for (int i = 0; i < n; i++) {
>             idcg += Math.log(2) / Math.log(i + 2);
>         }
>         return idcg;
>     }
> {code}
> You'll notice that the way it calculates the idcg for binary NDCG calculation is that
it uses the count in groundTruth. The problem is that when we use "recommendSize" (e.g. NDCG@10)
we may pass all the ground Truth and not just the ones in the first 10. This is a bit unexpected.
Of course, we could just limit the truths using array intersection and what not, but the users
shouldn't really have to do that. You can simply just count the # of matched ground truths
so it's easier to use this function.
> e.g.
> {code:java}
>     public static double nDCG(@Nonnull final List<?> rankedList,
>             @Nonnull final List<?> groundTruth, @Nonnull final int recommendSize)
{
>         double dcg = 0.d;
>         int matchedGroundTruths = 0;
>         for (int i = 0, n = recommendSize; i < n; i++) {
>             Object item_id = rankedList.get(i);
>             if (!groundTruth.contains(item_id)) {
>                 continue;
>             }
>             int rank = i + 1;
>             dcg += Math.log(2) / Math.log(rank + 1);
>             matchedGroundTruths++;
>         }
>         double idcg = IDCG(matchedGroundTruths);
>         return dcg / idcg;
>     }
> {code}
> Thanks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message