pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-2721) Wrong output generated while loading bags as input
Date Fri, 25 May 2012 14:22:23 GMT

     [ https://issues.apache.org/jira/browse/PIG-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Koji Noguchi updated PIG-2721:
------------------------------

    Attachment: pig-2721-trunk-notestyet.patch

Taking the logical plan when used with -t ColumnMapKeyPrune

{noformat}
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
C: (Name: LOStore Schema: id#11:chararray,bttype#14:chararray,cat#15:long)
|
|---B: (Name: LOForEach Schema: id#11:chararray,bttype#14:chararray,cat#15:long)
    |   |
    |   (Name: LOGenerate[false,true] Schema: id#11:chararray,bttype#14:chararray,cat#15:long)
    |   |   |
    |   |   id:(Name: Project Type: chararray Uid: 11 Input: 0 Column: (*))
    |   |   |
    |   |   mybag:(Name: Project Type: bag Uid: 12 Input: 1 Column: (*))   <==*HERE1*
    |   |
    |   |---(Name: LOInnerLoad[0] Schema: id#11:chararray)
    |   |
    |   |---mybag: (Name: LOInnerLoad[1] Schema: bttype#14:chararray,cat#15:long)
    |
    |---C: (Name: LOSort Schema: id#11:chararray,bttype#14:chararray,cat#15:long) <==*HERE2*
        |   |
        |   id:(Name: Project Type: chararray Uid: 11 Input: 0 Column: 0)
        |
        |---A: (Name: LOForEach Schema: id#11:chararray,mybag#12:bag{#18:tuple(bttype#14:chararray,cat#15:long)})
            |   |
            |   (Name: LOGenerate[false,false] Schema: id#11:chararray,mybag#12:bag{#18:tuple(bttype#14:chararray,cat#15:long)})
            |   |   |
            |   |   (Name: Cast Type: chararray Uid: 11)
            |   |   |
            |   |   |---id:(Name: Project Type: bytearray Uid: 11 Input: 0 Column: (*))
            |   |   |
            |   |   (Name: Cast Type: bag Uid: 12)
            |   |   |
            |   |   |---mybag:(Name: Project Type: bytearray Uid: 12 Input: 1 Column: (*))
            |   |
            |   |---(Name: LOInnerLoad[0] Schema: id#11:bytearray)
            |   |
            |   |---(Name: LOInnerLoad[1] Schema: mybag#12:bytearray)
            |
            |---A: (Name: LOLoad Schema: id#11:bytearray,mybag#12:bytearray)RequiredFields:null

{noformat}


Tracing the ColumnPrune*, use of 'Uid:12' gets lost at first LOGenerate (*HERE*) part when
its projection refers to LOSort (*HERE2*) and checks the schema.

Looking further, LOSort schema was not getting updated by SchemaPatcher when PushDownForEachFlatten
swapped ForEach and Sort.  This was due to PushDownForEachFlatten.reportChange() not passing
the changes it made.

I believe attached patch fixes this issue.
Confirmed logical plan now changes to 
{noformat}
$ diff /tmp/before /tmp/after
18c18
<     |---C: (Name: LOSort Schema: id#11:chararray,bttype#14:chararray,cat#15:long)
---
>     |---C: (Name: LOSort Schema: id#11:chararray,mybag#12:bag{#18:tuple(bttype#14:chararray,cat#15:long)})
{noformat}

and it produces the correct output.
                
> Wrong output generated while loading bags as input
> --------------------------------------------------
>
>                 Key: PIG-2721
>                 URL: https://issues.apache.org/jira/browse/PIG-2721
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0, 0.9.2
>            Reporter: Vivek Padmanabhan
>         Attachments: pig-2721-trunk-notestyet.patch
>
>
> {code}
> A = LOAD '/user/pvivek/sample' as (id:chararray,mybag:bag{tuple(bttype:chararray,cat:long)});
> B = foreach A generate id,FLATTEN(mybag) AS (bttype, cat);
> C = order B by id;
> dump C;
> {code}
> The above code generates wrong results when executed with Pig 0.10 and Pig 0.9
> The below is the sample input;
> {code}
> ...LKGaHqg--	{(aa,806743)}
> ..0MI1Y37w--	{(aa,498970)}
> ..0bnlpJrw--	{(aa,806740)}
> ..0p0IIhbA--	{(aa,498971),(se,498995)}
> ..1VkGqvXA--	{(aa,805219)}
> {code}
> I think the Pig optimizers are causing this issue.From the logs I can see that the $1
is pruned for the relation A.
> [main] INFO  org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned
for A: $1
> One workaround for this is to disable -t ColumnMapKeyPrune.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message