pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yi Ou (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-5108) AvroStorage on Tez with exception on nested records
Date Wed, 18 Jan 2017 10:08:26 GMT

    [ https://issues.apache.org/jira/browse/PIG-5108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827771#comment-15827771
] 

Yi Ou commented on PIG-5108:
----------------------------

The issue was also reproduced on AWS EMR 5.2.1, with the following Pig, Tez, and Hadoop versions:

tez.noarch                    0.8.4-1.amzn1                            @Bigtop
pig.noarch                    0.16.0.amzn.0-1.amzn1                    @Bigtop
hadoop.x86_64                 2.7.3.amzn.1-1.amzn1                     @Bigtop

> AvroStorage on Tez with exception on nested records
> ---------------------------------------------------
>
>                 Key: PIG-5108
>                 URL: https://issues.apache.org/jira/browse/PIG-5108
>             Project: Pig
>          Issue Type: Bug
>          Components: tez
>    Affects Versions: 0.16.0
>         Environment: HadoopVersion: 2.6.0-cdh5.8.0
> PigVersion: 0.16.0
> TezVersion: 0.7.0
>            Reporter: Sebastian Geller
>             Fix For: 0.17.0, 0.16.1
>
>         Attachments: person-prop.avro
>
>
> Hi,
> While migrating to the latest Pig version we have seen a general issue when using nested
Avro records on Tez:
> {code}
> Caused by: java.io.IOException: class org.apache.pig.impl.util.avro.AvroTupleWrapper.write
called, but not implemented yet
> 	at org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
> 	at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> The setup is
> schema
> {code}
> {
>     "fields": [
>         {
>             "name": "id",
>             "type": "int"
>         },
>         {
>             "name": "property",
>             "type": {
>                 "fields": [
>                     {
>                         "name": "id",
>                         "type": "int"
>                     }
>                 ],
>                 "name": "Property",
>                 "type": "record"
>             }
>         }
>     ],
>     "name": "Person",
>     "namespace": "com.github.ouyi.avro",
>     "type": "record"
> }
> {code}
> Pig script group_person.pig
> {code}
> loaded_person =
>     LOAD '$input'
>     USING AvroStorage();
> grouped_records =
>     GROUP
>         loaded_person BY (property.id);
> STORE grouped_records
>     INTO '$output'
>     USING AvroStorage();
> {code}
> sample data
> {code}
> {"id":1,"property":{"id":1}}
> {code}
> Execution on Tez
> {code}
> pig -x tez_local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p output=file:///output
group_person.pig
> ...
> Caused by: java.io.IOException: class org.apache.pig.impl.util.avro.AvroTupleWrapper.write
called, but not implemented yet
> 	at org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
> 	at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> Execution on mapred
> {code}
> pig -x local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p output=file:///output7
group_person.pig
> ...
> Output(s):
> Successfully stored 1 records in: "file:///output7"
> ...
> {code}
> I am going to attach the complete log files of both runs.
> I assume that the Pig script should work regardless of Tez or mapreduce? Is there any
underlying change when migrating to Tez which makes the schema invalid?
> Thanks,
> Sebastian



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message