Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 679CF101CA for ; Fri, 18 Apr 2014 08:08:00 +0000 (UTC) Received: (qmail 42611 invoked by uid 500); 18 Apr 2014 02:41:18 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 42211 invoked by uid 500); 18 Apr 2014 02:41:17 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 42191 invoked by uid 500); 18 Apr 2014 02:41:15 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 42185 invoked by uid 99); 18 Apr 2014 02:41:15 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Apr 2014 02:41:15 +0000 Date: Fri, 18 Apr 2014 02:41:15 +0000 (UTC) From: "Xuefu Zhang (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973722#comment-13973722 ] Xuefu Zhang commented on HIVE-6835: ----------------------------------- [~erwaman] Thanks for the explanation. Now I see where the problem is. SERDEPROPERTIES and TBLPROPERTIES are for different purpose. I'm curious why user would put avro.schema.literal in the serde properties, as this is table specific and it should be put in TBLPROPERTIES. SERDEPROPERTIES, on the other hand, is used to control serde behavior (plugin level instead of table level), such as field delimiter which doesn't necessary vary from table to table. If you check AvroSerde documentation, schema is specified in TBLPROPERTIES. https://cwiki.apache.org/confluence/display/Hive/AvroSerDe. Thus, it seems that this fix is for an invalid use case. What's your thought on this? > Reading of partitioned Avro data fails if partition schema does not match table schema > -------------------------------------------------------------------------------------- > > Key: HIVE-6835 > URL: https://issues.apache.org/jira/browse/HIVE-6835 > Project: Hive > Issue Type: Bug > Affects Versions: 0.12.0 > Reporter: Anthony Hsu > Assignee: Anthony Hsu > Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch > > > To reproduce: > {code} > create table testarray (a array); > load data local inpath '/home/ahsu/test/array.txt' into table testarray; > # create partitioned Avro table with one array column > create table avroarray partitioned by (y string) row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties ('avro.schema.literal'='{"namespace":"test","name":"avroarray","type": "record", "fields": [ { "name":"a", "type":{"type":"array","items":"string"} } ] }') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'; > insert into table avroarray partition(y=1) select * from testarray; > # add an int column with a default value of 0 > alter table avroarray set serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties('avro.schema.literal'='{"namespace":"test","name":"avroarray","type": "record", "fields": [ {"name":"intfield","type":"int","default":0},{ "name":"a", "type":{"type":"array","items":"string"} } ] }'); > # fails with ClassCastException > select * from avroarray; > {code} > The select * fails with: > {code} > Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)