pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-3379) Alias reuse in nested foreach causes PIG script to fail
Date Fri, 23 Aug 2013 01:32:52 GMT

    [ https://issues.apache.org/jira/browse/PIG-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748184#comment-13748184
] 

Daniel Dai commented on PIG-3379:
---------------------------------

Missing LODistinct in the posted logical plan. Should be:
{code}
    |---EventsPerMinute: (Name: LOForEach Schema: timeStamp#56:long,nbDevices#57:long,nbDevicesWatching#58:long)
        |   |
        |   (Name: LOGenerate[false,false,false] Schema: timeStamp#56:long,nbDevices#57:long,nbDevicesWatching#58:long)ColumnPrune:InputUids=[50,
49]ColumnPrune:OutputUids=[58, 57, 56]
        |   |   |
        |   |   (Name: Multiply Type: long Uid: 56)
        |   |   |
        |   |   |---group:(Name: Project Type: long Uid: 49 Input: 0 Column: (*))
        |   |   |
        |   |   |---(Name: Cast Type: long Uid: 54)
        |   |       |
        |   |       |---(Name: Constant Type: int Uid: 54)
        |   |   |
        |   |   (Name: UserFunc(org.apache.pig.builtin.BagSize) Type: long Uid: 57)
        |   |   |
        |   |   |---DistinctDevices:(Name: Project Type: bag Uid: 50 Input: 1 Column: (*))
        |   |   |
        |   |   (Name: UserFunc(org.apache.pig.builtin.BagSize) Type: long Uid: 58)
        |   |   |
        |   |   |---DistinctDevices:(Name: Project Type: bag Uid: 50 Input: 2 Column: (*))
        |   |
        |   |---(Name: LOInnerLoad[0] Schema: group#49:long)
        |   |
        |   |---DistinctDevices: (Name: LODistinct Schema: deviceId#22:chararray)
        |   |   |
        |   |   |---1-3: (Name: LOForEach Schema: deviceId#22:chararray)
        |   |       |   |
        |   |       |   (Name: LOGenerate[false] Schema: deviceId#22:chararray)
        |   |       |   |   |
        |   |       |   |   deviceId:(Name: Project Type: chararray Uid: 22 Input: 0 Column:
(*))
        |   |       |   |
        |   |       |   |---(Name: LOInnerLoad[1] Schema: deviceId#22:chararray)
        |   |       |
        |   |       |---Events: (Name: LOInnerLoad[1] Schema: eventTime#21:long,deviceId#22:chararray,eventName#23:chararray)
{code}

The plan looks right.

Talked with [~xuefuz], the idea is to use projectedOperator instead of alias at the time we
convert alias to position. The newly introduced projectedOperator is only used in alias translation.
After that, input# and col# will be use as the coordinates of ProjectExpression. Patch looks
good. I will commit it once tests pass.
                
> Alias reuse in nested foreach causes PIG script to fail
> -------------------------------------------------------
>
>                 Key: PIG-3379
>                 URL: https://issues.apache.org/jira/browse/PIG-3379
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.11.1
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>         Attachments: PIG-3379-draft.patch, PIG-3379.patch
>
>
> The following script fails:
> {code:title=temp.pig}
> Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray);
> Events = FOREACH Events GENERATE eventTime, deviceId, eventName;
> EventsPerMinute = GROUP Events BY (eventTime / 60000);
> EventsPerMinute = FOREACH EventsPerMinute {
>   DistinctDevices = DISTINCT Events.deviceId;
>   nbDevices = SIZE(DistinctDevices);
>   DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat';
>   nbDevicesWatching = SIZE(DistinctDevices);
>   GENERATE $0*60000 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as nbDevicesWatching;
> }
> EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0  AND timeStamp < 100000;
> A = FOREACH EventsPerMinute GENERATE timeStamp;
> describe A;
> {code}
> With the error:
> {code}
> 2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: 
> <file /home/xzhang/Documents/temp.pig, line 14, column 37> Invalid field projection.
Projected field [timeStamp] does not exist in schema: deviceId:chararray.
> {code}
> Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As an observation,
removing the last filter statement also fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message