From "Pi Song (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-277) UDF for computing correlation and covariance between data sets
Date Mon, 23 Jun 2008 15:02:44 GMT
Pi Song commented on PIG-277:
-----------------------------

Good work
- Please be a bit more careful with code formatting
- Please convert tabs to spaces (We use 1 tab = 4 spaces)

Covariance
- COV.combine: What does this do?
Tuple tuple = new Tuple(Integer.valueOf(values.size()+"").intValue());
- This looks a bit ugly:-
{noformat}
catch(RuntimeException t) {
throw new RuntimeException(t.getMessage() + ": " + input, t);
}
{noformat}

Correlation
int totalSchemas = Double.valueOf(((1+Math.sqrt(1+4*combined.arity()))/2)).intValue();
I think we may have problems with this line. Javadoc says .intValue() will truncate the fractional
part.

> UDF for computing correlation and covariance between data sets
> --------------------------------------------------------------
>
>                 Key: PIG-277
>                 URL: https://issues.apache.org/jira/browse/PIG-277
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ajay Garg
>            Priority: Minor
>         Attachments: stat.patch
>
>
> UDFs for computing correlation and covariance between data sets. Use following commands
to compute covariance
> A = load 'input.xml' using PigStorage(':');
> B = group A all;
> define c COV('a','b','c');
> D = foreach B generate group,c(A.\$0,A.\$1,A.\$2);

