Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 82E1F200CC6 for ; Tue, 4 Jul 2017 04:05:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 81CD0160BF9; Tue, 4 Jul 2017 02:05:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C8A8C160BEC for ; Tue, 4 Jul 2017 04:05:05 +0200 (CEST) Received: (qmail 1233 invoked by uid 500); 4 Jul 2017 02:05:03 -0000 Mailing-List: contact issues-help@hivemall.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hivemall.incubator.apache.org Delivered-To: mailing list issues@hivemall.incubator.apache.org Received: (qmail 1224 invoked by uid 99); 4 Jul 2017 02:05:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Jul 2017 02:05:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 37E5E1916AD for ; Tue, 4 Jul 2017 02:05:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id T82aHGDwBqPt for ; Tue, 4 Jul 2017 02:05:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 969EC5FB43 for ; Tue, 4 Jul 2017 02:05:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id C1AC5E0D39 for ; Tue, 4 Jul 2017 02:05:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1D97424609 for ; Tue, 4 Jul 2017 02:05:00 +0000 (UTC) Date: Tue, 4 Jul 2017 02:05:00 +0000 (UTC) From: "Makoto Yui (JIRA)" To: issues@hivemall.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HIVEMALL-124) NDCG - BinaryResponseMeasure "fix" MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 04 Jul 2017 02:05:06 -0000 [ https://issues.apache.org/jira/browse/HIVEMALL-124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16073019#comment-16073019 ] Makoto Yui edited comment on HIVEMALL-124 at 7/4/17 2:04 AM: ------------------------------------------------------------- [~uhyonc] Thanks for your report. [~takuti] Could you take a look? was (Author: myui): [~takuti] Could you take a look? > NDCG - BinaryResponseMeasure "fix" > ---------------------------------- > > Key: HIVEMALL-124 > URL: https://issues.apache.org/jira/browse/HIVEMALL-124 > Project: Hivemall > Issue Type: Improvement > Reporter: Uhyon Chung > Assignee: Takuya Kitazawa > > There's a small issue which makes it a bit hard to use the NDCG@x > from BinaryResponseMeasure.java > {code:java} > public static double nDCG(@Nonnull final List rankedList, > @Nonnull final List groundTruth, @Nonnull final int recommendSize) { > double dcg = 0.d; > double idcg = IDCG(Math.min(recommendSize, groundTruth.size())); > ... > public static double IDCG(final int n) { > double idcg = 0.d; > for (int i = 0; i < n; i++) { > idcg += Math.log(2) / Math.log(i + 2); > } > return idcg; > } > {code} > You'll notice that the way it calculates the idcg for binary NDCG calculation is that it uses the count in groundTruth. The problem is that when we use "recommendSize" (e.g. NDCG@10) we may pass all the ground Truth and not just the ones in the first 10. This is a bit unexpected. Of course, we could just limit the truths using array intersection and what not, but the users shouldn't really have to do that. You can simply just count the # of matched ground truths so it's easier to use this function. > e.g. > {code:java} > public static double nDCG(@Nonnull final List rankedList, > @Nonnull final List groundTruth, @Nonnull final int recommendSize) { > double dcg = 0.d; > int matchedGroundTruths = 0; > for (int i = 0, n = recommendSize; i < n; i++) { > Object item_id = rankedList.get(i); > if (!groundTruth.contains(item_id)) { > continue; > } > int rank = i + 1; > dcg += Math.log(2) / Math.log(rank + 1); > matchedGroundTruths++; > } > double idcg = IDCG(matchedGroundTruths); > return dcg / idcg; > } > {code} > Thanks -- This message was sent by Atlassian JIRA (v6.4.14#64029)