From dev-return-2780-archive-asf-public=cust-asf.ponee.io@madlib.apache.org Tue Feb 6 17:37:17 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 6EBE3180657 for ; Tue, 6 Feb 2018 17:37:17 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 5DF76160C3A; Tue, 6 Feb 2018 16:37:17 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CBA21160C34 for ; Tue, 6 Feb 2018 17:37:16 +0100 (CET) Received: (qmail 16940 invoked by uid 500); 6 Feb 2018 16:37:15 -0000 Mailing-List: contact dev-help@madlib.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@madlib.apache.org Delivered-To: mailing list dev@madlib.apache.org Received: (qmail 16919 invoked by uid 99); 6 Feb 2018 16:37:15 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Feb 2018 16:37:15 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 30454DFC29; Tue, 6 Feb 2018 16:37:15 +0000 (UTC) From: iyerr3 To: dev@madlib.apache.org Reply-To: dev@madlib.apache.org Message-ID: Subject: [GitHub] madlib pull request #231: RF: Output non-negative importance values Content-Type: text/plain Date: Tue, 6 Feb 2018 16:37:15 +0000 (UTC) GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/231 RF: Output non-negative importance values Variable importance is computed in RF as the difference in prediction accuracy between original data and permuted data from out-of-bag samples (OOB). Permuted data is defined as each variable resampled from its own distribution. This value can end up being negative if the number of levels for a variable is small and is unbalanced, as the redistribution doesn't change the data much. This commit shifts all the importance values if some of them are negative to ensure that the lowest importance value is 0. Closes #231 You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib bugfix/rf_neg_var_imp Alternatively you can review and apply these changes as the patch at: https://github.com/apache/madlib/pull/231.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #231 ---- commit f4265854dd94899145c9b40d4ce77450f34bdd78 Author: Rahul Iyer Date: 2018-02-06T16:20:49Z RF: Output non-negative importance values Variable importance is computed in RF as the difference in prediction accuracy between original data and permuted data from out-of-bag samples (OOB). Permuted data is defined as each variable resampled from its own distribution. This value can end up being negative if the number of levels for a variable is small and is unbalanced, as the redistribution doesn't change the data much. This commit shifts all the importance values if some of them are negative to ensure that the lowest importance value is 0. Closes #231 ---- ---