[ https://issues.apache.org/jira/browse/PIG1150?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=12791909#action_12791909
]
Tamir Kamara commented on PIG1150:

This can be very useful for me so I tested your patch but got weird results. I believe that
the problem is at combine method  it treats the tuple as if it contains the original values
but to my understanding it should work with the intermediate output and do something like
this:
{code}
static protected Tuple combine(DataBag values) throws ExecException {
double sum = 0;
long count = 0;
double sumOfSquares = 0;
Tuple output = mTupleFactory.newTuple(3);
for (Iterator<Tuple> it = values.iterator(); it.hasNext();) {
Tuple t = it.next();
sum += (Double) t.get(0);
count += (Long) t.get(1);
sumOfSquares += (Double) t.get(2);
}
output.set(0, sum);
output.set(1, count);
output.set(2, sumOfSquares);
return output;
}
{code}
> VAR() Variance UDF
> 
>
> Key: PIG1150
> URL: https://issues.apache.org/jira/browse/PIG1150
> Project: Pig
> Issue Type: New Feature
> Affects Versions: 0.5.0
> Environment: UDF, written in Pig 0.5 contrib/
> Reporter: Russell Jurney
> Fix For: 0.7.0
>
> Attachments: var.patch
>
>
> I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates variance in
a distributed manner, based on the AVG() builtin. It works by calculating the count, sum
and sum of squares, as described here: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
> Is this a worthwhile contribution? Taking the square root of this value using the contrib
SQRT() function gives Standard Deviation, which is missing from Pig.

This message is automatically generated by JIRA.

You can reply to this email to add a comment to the issue online.
