hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Viraj Bhat (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-1272) Column pruner causes wrong results
Date Tue, 02 Mar 2010 22:26:27 GMT
Column pruner causes wrong results
----------------------------------

                 Key: PIG-1272
                 URL: https://issues.apache.org/jira/browse/PIG-1272
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.6.0
            Reporter: Viraj Bhat
             Fix For: 0.7.0


For a simple script the column pruner optimization removes certain columns from the original
relation, which results in wrong results.

Input file "kv" contains the following columns (tab separated)
{code}
a       1
a       2
a       3
b       4
c       5
c       6
b       7
d       8
{code}

Now running this script in Pig 0.6 produces

{code}
kv = load 'kv' as (k,v);
keys= foreach kv generate k;
keys = distinct keys; 
keys = limit keys 2;
rejoin = join keys by k, kv by k;
dump rejoin;
{code}

(a,a)
(a,a)
(a,a)
(b,b)
(b,b)


Running this in Pig 0.5 version without column pruner results in:
(a,a,1)
(a,a,2)
(a,a,3)
(b,b,4)
(b,b,7)

When we disable the "ColumnPruner" optimization it gives right results.

Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message