hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anthony Hsu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema
Date Thu, 17 Apr 2014 19:02:15 GMT

    [ https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973279#comment-13973279
] 

Anthony Hsu commented on HIVE-6835:
-----------------------------------

The AvroSerDe handles schema evolution as described in http://avro.apache.org/docs/current/spec.html#Schema+Resolution.
 However, in the Hive code, the AvroSerDe needs to always be initialized with the latest schema
so that ObjectInspectorConverters.getConvertedOI() (in FetchOperator:getRecordReader()) will
work.  When the AvroSerDe actually reads the Avro file, it will then compare the latest schema
to the actual schema stored in the Avro file and do schema resolution/evolution.

> Reading of partitioned Avro data fails if partition schema does not match table schema
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-6835
>                 URL: https://issues.apache.org/jira/browse/HIVE-6835
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Anthony Hsu
>            Assignee: Anthony Hsu
>         Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch
>
>
> To reproduce:
> {code}
> create table testarray (a array<string>);
> load data local inpath '/home/ahsu/test/array.txt' into table testarray;
> # create partitioned Avro table with one array column
> create table avroarray partitioned by (y string) row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
with serdeproperties ('avro.schema.literal'='{"namespace":"test","name":"avroarray","type":
"record", "fields": [ { "name":"a", "type":{"type":"array","items":"string"} } ] }')  STORED
as INPUTFORMAT  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  OUTPUTFORMAT
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
> insert into table avroarray partition(y=1) select * from testarray;
> # add an int column with a default value of 0
> alter table avroarray set serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties('avro.schema.literal'='{"namespace":"test","name":"avroarray","type":
"record", "fields": [ {"name":"intfield","type":"int","default":0},{ "name":"a", "type":{"type":"array","items":"string"}
} ] }');
> # fails with ClassCastException
> select * from avroarray;
> {code}
> The select * fails with:
> {code}
> Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector
cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message