pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-3789) tuple in POStream binaryInputQueue keep changing
Date Wed, 19 Mar 2014 00:10:43 GMT

     [ https://issues.apache.org/jira/browse/PIG-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Daniel Dai updated PIG-3789:

    Attachment: PIG-3789-2.patch

The problem is in POValueInputTez. When we read tuple from edge, tuples are produced by BinSedesTuple.readFields.
It reuses the tuple and mFields will be cleared and rebuild for every tuple. When running
streaming operation asynchronously, tuple saved to binaryInputQueue keeps changing. Checked
all other TezLoad, seems fine. POShuffleTezLoad already made a copy (Packager.getValueTuple),
POSimpleTezLoad relies on loader to create new tuple. Other TezLoad will not send input tuple
to binaryInputQueue. 

Attach patch.

> tuple in POStream binaryInputQueue keep changing
> ------------------------------------------------
>                 Key: PIG-3789
>                 URL: https://issues.apache.org/jira/browse/PIG-3789
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>    Affects Versions: tez-branch
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: tez-branch
>         Attachments: PIG-3789-1.patch, PIG-3789-2.patch
> Similar to the comments in POSimpleTezLoad:
> {code}
>     /**
>      * Previously, we reused the same Result object for all results, but we found
>      * certain operators (e.g. POStream) save references to the Result object and
>      * expect it to be constant.
>      */
> {code}
> Tuples put into binaryInputQueue get changed when it is actually processed. Not exactly
sure why, but make a copy of the tuple solves the issue.

This message was sent by Atlassian JIRA

View raw message