spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiangrui Meng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-10641) skewness and kurtosis support
Date Tue, 06 Oct 2015 16:31:26 GMT

    [ https://issues.apache.org/jira/browse/SPARK-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14945309#comment-14945309
] 

Xiangrui Meng commented on SPARK-10641:
---------------------------------------

If we want to implement the numerically stable version. We should refactor the StdDevAgg implementation
to add moving third and fourth moments. Then the StdDevAgg should be renamed to CentralMomentAgg.

In the future, we need to make sure that codegen doesn't include unnecessary branches if kurtosis
and skewness are not asked by the user.

Btw, there will be some space for optimization, e.g.

{code}
df.groupBy("key").agg(skewness("a"), kurtosis("a"))
{code}

will have duplicate computation.

> skewness and kurtosis support
> -----------------------------
>
>                 Key: SPARK-10641
>                 URL: https://issues.apache.org/jira/browse/SPARK-10641
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML, SQL
>            Reporter: Jihong MA
>            Assignee: Seth Hendrickson
>
> Implementing skewness and kurtosis support based on following algorithm:
> https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Higher-order_statistics



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message