pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-3347) Store invocation brings side effect
Date Tue, 04 Feb 2014 18:08:14 GMT

    [ https://issues.apache.org/jira/browse/PIG-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890942#comment-13890942
] 

Koji Noguchi commented on PIG-3347:
-----------------------------------

bq. UID is to track column lineage so in logical optimizer, so that we can freely move operate
up and down,  ProjectionPatcher will reposition the column according to uid

I think part of my confusion comes from these two.  UID is used for (1) tracking column lineage.
 (2) UID is also used for ProjectionPatcher to reposition therefore requiring UID to be unique
within each relation.

Because of (2), we're seeing new uid being created whenever column is referenced multiple
times.
Like 
A = load 'a.txt' as (a:int);
B = foreach A generate a as col1, a as col2; 

This would create a schema like 
{noformat}
1-2: (Name: LOStore Schema: col1#1:int,col2#2:int)
...
    |---A: (Name: LOLoad Schema: a#1:int)RequiredFields:null
{noformat}

So without traversing the lineage, I cannot connect 'col2' to original 'a'.
However, optimizer like PushUpFilter&FilterAboveForeach seems to be using just UID to
determine the field usages...

But this is outside of this jira.  I need to spend more time learning how the pig compiler
works.

> Store invocation brings side effect
> -----------------------------------
>
>                 Key: PIG-3347
>                 URL: https://issues.apache.org/jira/browse/PIG-3347
>             Project: Pig
>          Issue Type: Bug
>          Components: grunt
>    Affects Versions: 0.11
>         Environment: local mode
>            Reporter: Sergey
>            Assignee: Daniel Dai
>            Priority: Critical
>             Fix For: 0.12.1
>
>         Attachments: PIG-3347-1.patch, PIG-3347-2-testonly.patch, PIG-3347-3.patch, PIG-3347-4-testonly.patch
>
>
> The problem is that intermediate 'store' invocation "changes" the final store output.
Looks like it brings some kind of side effect. We did use 'local' mode to run script
> here is the input data:
> 1
> 1
> Here is the script:
> {code}
> a = load 'test';
> a_group = group a by $0;
> b = foreach a_group {
>   a_distinct = distinct a.$0;
>   generate group, a_distinct;
> }
> --store b into 'b';
> c = filter b by SIZE(a_distinct) == 1;
> store c into 'out';
> {code}
> We expect output to be:
> 1 1
> The output is empty file.
> Uncomment {code}--store b into 'b';{code} line and see the diffrence.
> Yuo would get expected output.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message