Return-Path: X-Original-To: apmail-spark-issues-archive@minotaur.apache.org Delivered-To: apmail-spark-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 14A6F18884 for ; Tue, 6 Oct 2015 16:31:27 +0000 (UTC) Received: (qmail 99860 invoked by uid 500); 6 Oct 2015 16:31:27 -0000 Delivered-To: apmail-spark-issues-archive@spark.apache.org Received: (qmail 99831 invoked by uid 500); 6 Oct 2015 16:31:26 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 99821 invoked by uid 99); 6 Oct 2015 16:31:26 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Oct 2015 16:31:26 +0000 Date: Tue, 6 Oct 2015 16:31:26 +0000 (UTC) From: "Xiangrui Meng (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-10641) skewness and kurtosis support MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14945309#comment-14945309 ] Xiangrui Meng commented on SPARK-10641: --------------------------------------- If we want to implement the numerically stable version. We should refactor the StdDevAgg implementation to add moving third and fourth moments. Then the StdDevAgg should be renamed to CentralMomentAgg. In the future, we need to make sure that codegen doesn't include unnecessary branches if kurtosis and skewness are not asked by the user. Btw, there will be some space for optimization, e.g. {code} df.groupBy("key").agg(skewness("a"), kurtosis("a")) {code} will have duplicate computation. > skewness and kurtosis support > ----------------------------- > > Key: SPARK-10641 > URL: https://issues.apache.org/jira/browse/SPARK-10641 > Project: Spark > Issue Type: New Feature > Components: ML, SQL > Reporter: Jihong MA > Assignee: Seth Hendrickson > > Implementing skewness and kurtosis support based on following algorithm: > https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Higher-order_statistics -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org