pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Prim (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-4326) AvroStorageSchemaConversionUtilities does not properly convert schema for maps of arrays of records
Date Fri, 14 Nov 2014 10:59:33 GMT

    [ https://issues.apache.org/jira/browse/PIG-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212115#comment-14212115

Michael Prim commented on PIG-4326:

Thanks for the feedback, I realized that during development and was actually also a bit surprised.
However removing this extra layer, breaks the already existing testLoadRecordsWithMapOfRecords
test, and would be not backward compatible.

Further, if you create a map<arrray<InnerRecord>> using avro avdl files, it is
just syntactic sugar for actually having some map<WrapperRecord> where WrapperRecord
has one field, namely an array of InnerRecord. As neither the WrapperRecord has an alias,
nor the array of InnerRecords itself, it is a bit confusing that both get the "array" alias.

So we could stick to the old behavior for records and drop the wrapping tuple only for maps
and arrays, but then the resulting output will look different than the input I think.

> AvroStorageSchemaConversionUtilities does not properly convert schema for maps of arrays
of records
> ---------------------------------------------------------------------------------------------------
>                 Key: PIG-4326
>                 URL: https://issues.apache.org/jira/browse/PIG-4326
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.12.0, 0.13.0
>            Reporter: Michael Prim
>            Assignee: Michael Prim
>             Fix For: 0.15.0
>         Attachments: mapsOfArraysOfRecords.patch
> I tried to convert the avro schema of a map of arrays of records into the proper pig
schema and got always empty map schemas in pig.
> The reason is that the AvroStorageSchemaConversionUtilities does only assume records
or primitive types as content of the map. However, a map of arrays, or a map of map, could
have a schema itself and requires recursive calling to derive the full schema.
> I wrote a unit test to test for maps of arrays of records which fails with every pig
release since the AvroStorage was rewritten (I think this was in 0.12), and there have been
no changes since then in the trunk. 
> Further the attached patch contains the (rather simple) fix that makes the schema conversion
utils succeed.
> Would appreciate further comments and if this can be included upstream.

This message was sent by Atlassian JIRA

View raw message