hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anthony Hsu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema
Date Wed, 23 Apr 2014 01:25:15 GMT

    [ https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977710#comment-13977710
] 

Anthony Hsu commented on HIVE-6835:
-----------------------------------

Yes, this is possible, but I would have to add these "instanceof AbstractSerde" checks and
then cast the Deserializer as an AbstractSerde before I can use the new initialize() method.
 There are dozens of usages of .initialize() and adding all this type checking/casting code
in so many places just for this new method doesn't seem very clean to me.

Also, if we add the new initialize() method, what should we do for table-level serde initialization?
 When dealing with the table, there are no partition properties, so are we supposed to pass
the table properties for both the tblProps and partProps arguments? If we leave partProps
null, then the default new initialize() method implementation will just pass null to the old
initialize() method.

There doesn't seem to be a very clean way of adding a new initialize() method without creating
a lot of redundant boilerplate code and creating confusion which initialize() method to use
and what values to pass in.  Given these concerns, I feel that prepending "table." might be
a cleaner and less confusing approach.  What are your thoughts on this?

> Reading of partitioned Avro data fails if partition schema does not match table schema
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-6835
>                 URL: https://issues.apache.org/jira/browse/HIVE-6835
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Anthony Hsu
>            Assignee: Anthony Hsu
>         Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch
>
>
> To reproduce:
> {code}
> create table testarray (a array<string>);
> load data local inpath '/home/ahsu/test/array.txt' into table testarray;
> # create partitioned Avro table with one array column
> create table avroarray partitioned by (y string) row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
with serdeproperties ('avro.schema.literal'='{"namespace":"test","name":"avroarray","type":
"record", "fields": [ { "name":"a", "type":{"type":"array","items":"string"} } ] }')  STORED
as INPUTFORMAT  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  OUTPUTFORMAT
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
> insert into table avroarray partition(y=1) select * from testarray;
> # add an int column with a default value of 0
> alter table avroarray set serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties('avro.schema.literal'='{"namespace":"test","name":"avroarray","type":
"record", "fields": [ {"name":"intfield","type":"int","default":0},{ "name":"a", "type":{"type":"array","items":"string"}
} ] }');
> # fails with ClassCastException
> select * from avroarray;
> {code}
> The select * fails with:
> {code}
> Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector
cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message