hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Justin Hu (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-1543) IsEmpty returns the wrong value after using LIMIT
Date Thu, 12 Aug 2010 21:00:16 GMT
IsEmpty returns the wrong value after using LIMIT
-------------------------------------------------

                 Key: PIG-1543
                 URL: https://issues.apache.org/jira/browse/PIG-1543
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.7.0
            Reporter: Justin Hu


1. Two input files:

1a: limit_empty.input_a
1
1
1

1b: limit_empty.input_b
2
2

2.
The pig script: limit_empty.pig

-- A contains only 1's & B contains only 2's
A = load 'limit_empty.input_a' as (a1:int);
B = load 'limit_empty.input_a' as (b1:int);

C =COGROUP A by a1, B by b1;
D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), COUNT(B);
store D into 'limit_empty.output/d';
-- After the script done, we see the right results:
-- {(1),(1),(1)}   {}      1       0       3       0
-- {}         {(2),(2)}      0       1       0       2

C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 0:1), COUNT(Alim),
COUNT(Blim);
store D1 into 'limit_empty.output/d1';
-- After the script done, we see the unexpected results:
-- {(1)}   {}        1       1       1       0
-- {}      {(2)}     1       1       0       1

dump D;
dump D1;

3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:

The major one:

IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while IsEmpty() returns
correctly in limit_empty.output/d/*.

The difference is that one has been applied with "LIMIT" before using IsEmpty().

The minor one:

The redirected output only contains the first dump:

({(1),(1),(1)},{},1,0,3L,0L)
({},{(2),(2)},0,1,0L,2L)

We expect two more lines like:
({(1)},{},1,1,1L,0L)
({},{(2)},1,1,0L,1L)

Besides, there is error says:

[main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - java.lang.ClassCastException:
java.lang.Integer cannot be cast to org.apache.pig.data.Tuple


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message