pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Coveney (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2687) Add relation/operator scoping to Pig
Date Wed, 09 May 2012 00:07:50 GMT

    [ https://issues.apache.org/jira/browse/PIG-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270952#comment-13270952
] 

Jonathan Coveney commented on PIG-2687:
---------------------------------------

The downside to the approach of renaming within the block is as such...

{code}
a = load 'thing' as (x:int);
b = foreach (group a all) {
    a = distinct a;
    generate a;
}
{code}

In this case, you know that the distinct is on the globally scoped a, so the output of the
distinct should be sequential_prefix_a

However, if you did:
{code}
a = load 'thing' as (x:int);
b = foreach a generate x, x as y;
c = foreach (group a all) {
    a = distinct a;
    b = limit a 100;
    generate b;
}
{code}

In this case, the distinct is on the global a, but the limit is not... so you're going to
have to check what variables are defined anyway in order to know whether you need to do the
replacement or not....you only do it if there is a clash. Hmm. Will think about the cleanest
implementation. Ideally I want to avoid a bunch of lookups, but it may be unavoidable and
this still may be the cleanest way...
                
> Add relation/operator scoping to Pig
> ------------------------------------
>
>                 Key: PIG-2687
>                 URL: https://issues.apache.org/jira/browse/PIG-2687
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Priority: Minor
>             Fix For: 0.11
>
>
> The idea is to add a real notion of scope that can be used to manage namespace. This
would mean the addition of blocks to pig, probably with some sort of syntax like this...
> {code}
> a = load thing as (x:int, y:int);
> b = foreach a generate x, y, x*y as z;
> {
>   a = group b by z;
>   b = foreach a generate COUNT(b);
>   global b;
> }
> {code}
> which would replace the alias b with the nested b value in the scope. This could also
be used in nested foreach blocks, and macros could just become blocks as well.
> I am 95% sure about how to implement this... I have a failed patch attempt, and need
to study a bit more about how Pig uses its logical operators.
> Any thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message