pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PIG-2537) Output from flatten with a null tuple input generating data inconsistent with the schema
Date Thu, 16 Feb 2012 20:35:00 GMT
Output from flatten with a null tuple input generating data inconsistent with the schema
----------------------------------------------------------------------------------------

                 Key: PIG-2537
                 URL: https://issues.apache.org/jira/browse/PIG-2537
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.9.0, 0.8.0
            Reporter: Xuefu Zhang
            Assignee: Alan Gates


For the following pig script,

grunt> A = load 'file' as ( a : tuple( x, y, z ), b, c );
grunt> B = foreach A generate flatten( $0 ), b, c;
grunt> describe B;
B: {a::x: bytearray,a::y: bytearray,a::z: bytearray,b: bytearray,c: bytearray}

Alias B has a clear schema.

However, on the backend, for a row if $0 happens to be null, then output tuple become something
like 
(null, b_value, c_value), which is obviously inconsistent with the schema. The behaviour is
confirmed by pig code inspection. 

This inconsistency corrupts data because of position shifts. Expected output row should be
something like
(null, null, null, b_value, c_value).


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message