pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1834) relation-as-scalar - uses the last statement associated with the scalar alias
Date Fri, 28 Jan 2011 21:44:44 GMT

     [ https://issues.apache.org/jira/browse/PIG-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thejas M Nair updated PIG-1834:
-------------------------------

    Description: 
Pig allows relation alias to be re-used , ie refer to different relations(/statements) . I
have not seen this in documentation, but I have seen people writing such queries.

For example -
{code}
l = load 'x' as (a,b);
l = filter l by a > 1;
l = foreach ...
store l into  'y'
{code}

At any part of the query, the alias "l' always represents the relation it last associated
with the portion of pig-query above it.

But in case of relation-as-scalar feature the association is happening with the last relation
associated with the alias in entire script.

For example -
{code}
 l = load 'x' as (a,b);
 A = load 'x' as (a,b); 
 B = foreach A generate a, l.a as la;
 l = foreach l generate a+1 as a;
store B into 'b';
{code}

The alias l in relation with alias B should refer to the load, but it refers to the foreach
statement -
{code}

#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-16
Map Plan
l: Store(file:/tmp/temp-953430379/tmp2006282146:org.apache.pig.impl.io.InterStorage) - scope-8
|
|---l: New For Each(false)[bag] - scope-7
    |   |
    |   Add[int] - scope-5
    |   |
    |   |---Cast[int] - scope-3
    |   |   |  
    |   |   |---Project[bytearray][0] - scope-2
    |   |
    |   |---Constant(1) - scope-4
    |
    |---l: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage) -
scope-1--------
Global sort: false
----------------

MapReduce node scope-17
Map Plan
B: Store(file:///Users/tejas/pig_type/trunk/b:org.apache.pig.builtin.PigStorage) - scope-15
|
|---B: New For Each(false,false)[bag] - scope-14
    |   |
    |   Project[bytearray][0] - scope-9
    |   |
    |   POUserFunc(org.apache.pig.impl.builtin.ReadScalars)[int] - scope-13
    |   |
    |   |---Constant(0) - scope-11
    |   |
    |   |---Constant(file:/tmp/temp-953430379/tmp2006282146) - scope-12
    |
    |---A: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage) -
scope-0--------
Global sort: false
----------------
{code}



  was:
Pig allows relation alias to be re-used , ie refer to different relations(/statements) . I
have not seen this in documentation, but I have seen people writing such queries.

For example -
{code}
l = load 'x' as (a,b);
l = filter l by a > 1;
l = foreach ...
store l into  'y'
{code}

At any part of the query, the alias "l' always represents the relation it last associated
with the portion of pig-query above it.

But in case of relation-as-scalar feature the association is happening with the last relation
associated with the alias in entire script.

For example -
{code}
 l = load 'x' as (a,b);
 A = load 'x' as (a,b); 
 B = foreach A generate a, l.a as la;
 l = foreach l generate a+1 as a;
store B into 'b';
{code}

The alias l in relation with alias B should refer to the load, but it refers to the foreach
statement -
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-16
Map Plan
l: Store(file:/tmp/temp-953430379/tmp2006282146:org.apache.pig.impl.io.InterStorage) - scope-8
|
|---l: New For Each(false)[bag] - scope-7
    |   |
    |   Add[int] - scope-5
    |   |
    |   |---Cast[int] - scope-3
    |   |   |  
    |   |   |---Project[bytearray][0] - scope-2
    |   |
    |   |---Constant(1) - scope-4
    |
    |---l: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage) -
scope-1--------
Global sort: false
----------------

MapReduce node scope-17
Map Plan
B: Store(file:///Users/tejas/pig_type/trunk/b:org.apache.pig.builtin.PigStorage) - scope-15
|
|---B: New For Each(false,false)[bag] - scope-14
    |   |
    |   Project[bytearray][0] - scope-9
    |   |
    |   POUserFunc(org.apache.pig.impl.builtin.ReadScalars)[int] - scope-13
    |   |
    |   |---Constant(0) - scope-11
    |   |
    |   |---Constant(file:/tmp/temp-953430379/tmp2006282146) - scope-12
    |
    |---A: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage) -
scope-0--------
Global sort: false
----------------




> relation-as-scalar - uses the last statement associated with the scalar alias
> -----------------------------------------------------------------------------
>
>                 Key: PIG-1834
>                 URL: https://issues.apache.org/jira/browse/PIG-1834
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>             Fix For: 0.8.0, 0.9.0
>
>
> Pig allows relation alias to be re-used , ie refer to different relations(/statements)
. I have not seen this in documentation, but I have seen people writing such queries.
> For example -
> {code}
> l = load 'x' as (a,b);
> l = filter l by a > 1;
> l = foreach ...
> store l into  'y'
> {code}
> At any part of the query, the alias "l' always represents the relation it last associated
with the portion of pig-query above it.
> But in case of relation-as-scalar feature the association is happening with the last
relation associated with the alias in entire script.
> For example -
> {code}
>  l = load 'x' as (a,b);
>  A = load 'x' as (a,b); 
>  B = foreach A generate a, l.a as la;
>  l = foreach l generate a+1 as a;
> store B into 'b';
> {code}
> The alias l in relation with alias B should refer to the load, but it refers to the foreach
statement -
> {code}
> #--------------------------------------------------
> # Map Reduce Plan
> #--------------------------------------------------
> MapReduce node scope-16
> Map Plan
> l: Store(file:/tmp/temp-953430379/tmp2006282146:org.apache.pig.impl.io.InterStorage)
- scope-8
> |
> |---l: New For Each(false)[bag] - scope-7
>     |   |
>     |   Add[int] - scope-5
>     |   |
>     |   |---Cast[int] - scope-3
>     |   |   |  
>     |   |   |---Project[bytearray][0] - scope-2
>     |   |
>     |   |---Constant(1) - scope-4
>     |
>     |---l: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage)
- scope-1--------
> Global sort: false
> ----------------
> MapReduce node scope-17
> Map Plan
> B: Store(file:///Users/tejas/pig_type/trunk/b:org.apache.pig.builtin.PigStorage) - scope-15
> |
> |---B: New For Each(false,false)[bag] - scope-14
>     |   |
>     |   Project[bytearray][0] - scope-9
>     |   |
>     |   POUserFunc(org.apache.pig.impl.builtin.ReadScalars)[int] - scope-13
>     |   |
>     |   |---Constant(0) - scope-11
>     |   |
>     |   |---Constant(file:/tmp/temp-953430379/tmp2006282146) - scope-12
>     |
>     |---A: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage)
- scope-0--------
> Global sort: false
> ----------------
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message