##### Site index · List index
Message view
Top
From "Tamir Kamara (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1150) VAR() Variance UDF
Date Thu, 17 Dec 2009 13:04:18 GMT
```
[ https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791909#action_12791909
]

Tamir Kamara commented on PIG-1150:
-----------------------------------

This can be very useful for me so I tested your patch but got weird results. I believe that
the problem is at combine method - it treats the tuple as if it contains the original values
but to my understanding it should work with the intermediate output and do something like
this:

{code}
static protected Tuple combine(DataBag values) throws ExecException {
double sum = 0;
long count = 0;
double sumOfSquares = 0;

Tuple output = mTupleFactory.newTuple(3);

for (Iterator<Tuple> it = values.iterator(); it.hasNext();) {
Tuple t = it.next();

sum += (Double) t.get(0);
count += (Long) t.get(1);
sumOfSquares += (Double) t.get(2);

}

output.set(0, sum);
output.set(1, count);
output.set(2, sumOfSquares);

return output;
}
{code}

> VAR() Variance UDF
> ------------------
>
>                 Key: PIG-1150
>                 URL: https://issues.apache.org/jira/browse/PIG-1150
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>         Environment: UDF, written in Pig 0.5 contrib/
>            Reporter: Russell Jurney
>             Fix For: 0.7.0
>
>         Attachments: var.patch
>
>
> I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates variance in
a distributed manner, based on the AVG() builtin.  It works by calculating the count, sum
and sum of squares, as described here: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
> Is this a worthwhile contribution?  Taking the square root of this value using the contrib
SQRT() function gives Standard Deviation, which is missing from Pig.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

```
Mime
View raw message