pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiten Java (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-3668) COR built-in function when atleast one of the coefficient values is NaN
Date Wed, 15 Jan 2014 14:21:22 GMT

     [ https://issues.apache.org/jira/browse/PIG-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Hiten Java updated PIG-3668:

    Patch Info:   (was: Patch Available)

> COR built-in function when atleast one of the coefficient values is NaN
> -----------------------------------------------------------------------
>                 Key: PIG-3668
>                 URL: https://issues.apache.org/jira/browse/PIG-3668
>             Project: Pig
>          Issue Type: Bug
>          Components: internal-udfs
>    Affects Versions: 0.12.0
>            Reporter: Hiten Java
>         Attachments: COR.diff
> When passing multiple column keys for Correlation analysis, if coefficient value of one
of the combinations is NaN, then the value for all other combinations is not computed.
> Pearson Co-efficient value is NaN if all values for a given column are the same.
> Example:
> A = LOAD 'myData' USING org.apache.hcatalog.pig.HCatLoader();
> B = group A all;
> c = foreach B generate group, FLATTEN(COR((bag{tuple(double)}) A.col_1,(bag{tuple(double)})
A.col_2, (bag{tuple(double)}) A.col_3, (bag{tuple(double)}) A.col_4));
> If the value of pearson coefficient for col_1 and col_2 is NaN, then value of co-efficients
for all combinations is NaN
> This is happening because of 'return null' statement in catch block on lines 157 and
235 in file org.apache.pig.builtin.COR.java
> If the catch block is removed, then the correlation analysis would continue for the remaining
columns. (ApachePig 0.12.0)

This message was sent by Atlassian JIRA

View raw message