hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Ding (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-1525) Incorrect data generated by diff of SUM
Date Thu, 29 Jul 2010 22:00:21 GMT
Incorrect data generated by diff of SUM
---------------------------------------

                 Key: PIG-1525
                 URL: https://issues.apache.org/jira/browse/PIG-1525
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.7.0
            Reporter: Richard Ding
            Assignee: Richard Ding
             Fix For: 0.8.0


Given data;

input1:

{code}
id9     0
{code}

input2:

{code}
id8     1
id9     1
{code}

Pig script

{code}
A = LOAD 'input1' AS (id:chararray, val:long);
B = LOAD 'input2' AS (id:chararray, val:long);
C = COGROUP A BY id, B BY id;
D = FOREACH C GENERATE group, SUM(B.val), SUM(A.val), (SUM(A.val) - SUM(B.val));
dump D;
{code}

generates incorrect data:

{code}
(id8,1L,,)
(id9,1L,0L,-2L)
{code}

The workaround is to replace the FOREACH statement with

{code}
D = FOREACH C GENERATE group, SUM(B.val) as b, SUM(A.val) as a;
E = FOREACH D GENERATE $0, b, a, (a-b);
{code}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message