Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7C48D200CF8 for ; Thu, 14 Sep 2017 11:08:10 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 7B1191609CD; Thu, 14 Sep 2017 09:08:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C16AF1609CC for ; Thu, 14 Sep 2017 11:08:09 +0200 (CEST) Received: (qmail 81744 invoked by uid 500); 14 Sep 2017 09:08:07 -0000 Mailing-List: contact issues-help@hivemall.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hivemall.incubator.apache.org Delivered-To: mailing list issues@hivemall.incubator.apache.org Received: (qmail 81734 invoked by uid 99); 14 Sep 2017 09:08:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Sep 2017 09:08:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 41C4B18F0F4 for ; Thu, 14 Sep 2017 09:08:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id ooGMW2RYNXBI for ; Thu, 14 Sep 2017 09:08:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 1B98E61010 for ; Thu, 14 Sep 2017 09:08:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 98F6EE01D8 for ; Thu, 14 Sep 2017 09:08:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 57DA225383 for ; Thu, 14 Sep 2017 09:08:00 +0000 (UTC) Date: Thu, 14 Sep 2017 09:08:00 +0000 (UTC) From: "Makoto Yui (JIRA)" To: issues@hivemall.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVEMALL-124) NDCG - BinaryResponseMeasure "fix" MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 14 Sep 2017 09:08:10 -0000 [ https://issues.apache.org/jira/browse/HIVEMALL-124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165970#comment-16165970 ] Makoto Yui commented on HIVEMALL-124: ------------------------------------- Changes has been made in [this commit|https://github.com/myui/hivemall/blame/v0.5-alpha.1/core/src/main/java/hivemall/evaluation/BinaryResponsesMeasures.java#L45] for nDCG@k. [~takuti] I think the current code has a bug and [librec's one|https://github.com/guoguibing/librec/blob/f49ee52686168a334ce558496ea3fb2fd42701ca/core/src/main/java/net/librec/eval/ranking/NormalizedDCGEvaluator.java#L66] is correct. {code:java} double idcg = IDCG(Math.min(recommendSize, groundTruth.size())); for (int i = 0, n = recommendSize; i < n; i++) { Object item_id = rankedList.get(i); // may cause NPE! .. {code} should be {code:java} final int k = Math.min(rankedList.size(), recommendSize); for (int i = 0; i < k; i++) { .. } double idcg = IDCG(Math.min(groundTruth.size(), k)); {code} How do you think? (cc: [~uhyonc] ) > NDCG - BinaryResponseMeasure "fix" > ---------------------------------- > > Key: HIVEMALL-124 > URL: https://issues.apache.org/jira/browse/HIVEMALL-124 > Project: Hivemall > Issue Type: Improvement > Reporter: Uhyon Chung > Assignee: Takuya Kitazawa > > There's a small issue which makes it a bit hard to use the NDCG@x > from BinaryResponseMeasure.java > {code:java} > public static double nDCG(@Nonnull final List rankedList, > @Nonnull final List groundTruth, @Nonnull final int recommendSize) { > double dcg = 0.d; > double idcg = IDCG(Math.min(recommendSize, groundTruth.size())); > ... > public static double IDCG(final int n) { > double idcg = 0.d; > for (int i = 0; i < n; i++) { > idcg += Math.log(2) / Math.log(i + 2); > } > return idcg; > } > {code} > You'll notice that the way it calculates the idcg for binary NDCG calculation is that it uses the count in groundTruth. The problem is that when we use "recommendSize" (e.g. NDCG@10) we may pass all the ground Truth and not just the ones in the first 10. This is a bit unexpected. Of course, we could just limit the truths using array intersection and what not, but the users shouldn't really have to do that. You can simply just count the # of matched ground truths so it's easier to use this function. > e.g. > {code:java} > public static double nDCG(@Nonnull final List rankedList, > @Nonnull final List groundTruth, @Nonnull final int recommendSize) { > double dcg = 0.d; > int matchedGroundTruths = 0; > for (int i = 0, n = recommendSize; i < n; i++) { > Object item_id = rankedList.get(i); > if (!groundTruth.contains(item_id)) { > continue; > } > int rank = i + 1; > dcg += Math.log(2) / Math.log(rank + 1); > matchedGroundTruths++; > } > double idcg = IDCG(matchedGroundTruths); > return dcg / idcg; > } > {code} > Thanks -- This message was sent by Atlassian JIRA (v6.4.14#64029)