hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1272) Column pruner causes wrong results
Date Sun, 14 Mar 2010 08:47:27 GMT

     [ https://issues.apache.org/jira/browse/PIG-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai updated PIG-1272:
----------------------------

    Status: Patch Available  (was: Reopened)

> Column pruner causes wrong results
> ----------------------------------
>
>                 Key: PIG-1272
>                 URL: https://issues.apache.org/jira/browse/PIG-1272
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Viraj Bhat
>            Assignee: Daniel Dai
>             Fix For: 0.7.0
>
>         Attachments: PIG-1272-1.patch, PIG-1272-2.patch
>
>
> For a simple script the column pruner optimization removes certain columns from the original
relation, which results in wrong results.
> Input file "kv" contains the following columns (tab separated)
> {code}
> a       1
> a       2
> a       3
> b       4
> c       5
> c       6
> b       7
> d       8
> {code}
> Now running this script in Pig 0.6 produces
> {code}
> kv = load 'kv' as (k,v);
> keys= foreach kv generate k;
> keys = distinct keys; 
> keys = limit keys 2;
> rejoin = join keys by k, kv by k;
> dump rejoin;
> {code}
> (a,a)
> (a,a)
> (a,a)
> (b,b)
> (b,b)
> Running this in Pig 0.5 version without column pruner results in:
> (a,a,1)
> (a,a,2)
> (a,a,3)
> (b,b,4)
> (b,b,7)
> When we disable the "ColumnPruner" optimization it gives right results.
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message