pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russell Jurney (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-1150) VAR() Variance UDF
Date Wed, 26 Sep 2012 23:23:10 GMT

    [ https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464278#comment-13464278

Russell Jurney commented on PIG-1150:

I can't find variance in DataFu. I'm going to fix this in Piggybank, one day.
> VAR() Variance UDF
> ------------------
>                 Key: PIG-1150
>                 URL: https://issues.apache.org/jira/browse/PIG-1150
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>         Environment: UDF, written in Pig 0.5 contrib/
>            Reporter: Russell Jurney
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: var.patch
> I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates variance in
a distributed manner, based on the AVG() builtin.  It works by calculating the count, sum
and sum of squares, as described here: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
> Is this a worthwhile contribution?  Taking the square root of this value using the contrib
SQRT() function gives Standard Deviation, which is missing from Pig.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message