hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Viraj Bhat (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-1633) Using an alias withing Nested Foreach causes indeterminate behaviour
Date Mon, 20 Sep 2010 23:43:34 GMT
Using an alias withing Nested Foreach causes indeterminate behaviour
--------------------------------------------------------------------

                 Key: PIG-1633
                 URL: https://issues.apache.org/jira/browse/PIG-1633
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.7.0, 0.6.0, 0.5.0, 0.4.0
            Reporter: Viraj Bhat


I have created a RANDOMINT function which generates random numbers between (0 and specified
value), For example RANDOMINT(4) gives random numbers between 0 and 3 (inclusive)

{code}
$hadoop fs -cat rand.dat
f
g
h
i
j
k
l
m
{code}

The pig script is as follows:
{code}
register math.jar;
A = load 'rand.dat' using PigStorage() as (data);

B = foreach A {
        r = math.RANDOMINT(4);
        generate
                data,
                r as random,
                ((r == 3)?1:0) as quarter;
        };

dump B;
{code}

The results are as follows:
{code}
{color:red} 
(f,0,0)
(g,3,0)
(h,0,0)
(i,2,0)
(j,3,0)
(k,2,0)
(l,0,1)
(m,1,0)
{color} 
{code}

If you observe, (j,3,0) is created because r is used both in the foreach and generate clauses
and generate different values.

Modifying the above script to below solves the issue. The M/R jobs from both scripts are the
same. It is just a matter of convenience. 
{code}
A = load 'rand.dat' using PigStorage() as (data);

B = foreach A generate
        data,
        math.RANDOMINT(4) as r;

C = foreach B generate
        data,
        r,
        ((r == 3)?1:0) as quarter;

dump C;
{code}

Is this issue related to PIG:747?
Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message