pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Ding (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1543) IsEmpty returns the wrong value after using LIMIT
Date Fri, 03 Sep 2010 18:37:34 GMT

    [ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906008#action_12906008
] 

Richard Ding commented on PIG-1543:
-----------------------------------

+1. Looks good.

> IsEmpty returns the wrong value after using LIMIT
> -------------------------------------------------
>
>                 Key: PIG-1543
>                 URL: https://issues.apache.org/jira/browse/PIG-1543
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Justin Hu
>            Assignee: Daniel Dai
>             Fix For: 0.8.0
>
>         Attachments: PIG-1543-1.patch
>
>
> 1. Two input files:
> 1a: limit_empty.input_a
> 1
> 1
> 1
> 1b: limit_empty.input_b
> 2
> 2
> 2.
> The pig script: limit_empty.pig
> -- A contains only 1's & B contains only 2's
> A = load 'limit_empty.input_a' as (a1:int);
> B = load 'limit_empty.input_a' as (b1:int);
> C =COGROUP A by a1, B by b1;
> D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), COUNT(B);
> store D into 'limit_empty.output/d';
> -- After the script done, we see the right results:
> -- {(1),(1),(1)}   {}      1       0       3       0
> -- {}         {(2),(2)}      0       1       0       2
> C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
> D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 0:1), COUNT(Alim),
COUNT(Blim);
> store D1 into 'limit_empty.output/d1';
> -- After the script done, we see the unexpected results:
> -- {(1)}   {}        1       1       1       0
> -- {}      {(2)}     1       1       0       1
> dump D;
> dump D1;
> 3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:
> The major one:
> IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while IsEmpty() returns
correctly in limit_empty.output/d/*.
> The difference is that one has been applied with "LIMIT" before using IsEmpty().
> The minor one:
> The redirected output only contains the first dump:
> ({(1),(1),(1)},{},1,0,3L,0L)
> ({},{(2),(2)},0,1,0L,2L)
> We expect two more lines like:
> ({(1)},{},1,1,1L,0L)
> ({},{(2)},1,1,0L,1L)
> Besides, there is error says:
> [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - java.lang.ClassCastException:
java.lang.Integer cannot be cast to org.apache.pig.data.Tuple

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message