hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russell Jurney (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1150) VAR() Variance UDF
Date Wed, 16 Dec 2009 19:56:18 GMT

     [ https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Russell Jurney updated PIG-1150:
--------------------------------

    Attachment: var.patch

This patch will not cut the mustard - it lacks Javadoc, and test cases and its just plain
ugly.

That being said, people requested this on twitter, so I'm pushing this one for people to use
if they want to.  Will get a passable patch up later this week.

> VAR() Variance UDF
> ------------------
>
>                 Key: PIG-1150
>                 URL: https://issues.apache.org/jira/browse/PIG-1150
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>         Environment: UDF, written in Pig 0.5 contrib/
>            Reporter: Russell Jurney
>             Fix For: 0.7.0
>
>         Attachments: var.patch
>
>
> I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates variance in
a distributed manner, based on the AVG() builtin.  It works by calculating the count, sum
and sum of squares, as described here: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
> Is this a worthwhile contribution?  Taking the square root of this value using the contrib
SQRT() function gives Standard Deviation, which is missing from Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message