Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1103A11401 for ; Fri, 11 Apr 2014 09:17:23 +0000 (UTC) Received: (qmail 2120 invoked by uid 500); 11 Apr 2014 09:17:18 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 1984 invoked by uid 500); 11 Apr 2014 09:17:17 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 1951 invoked by uid 500); 11 Apr 2014 09:17:16 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 1946 invoked by uid 99); 11 Apr 2014 09:17:15 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Apr 2014 09:17:15 +0000 Date: Fri, 11 Apr 2014 09:17:15 +0000 (UTC) From: "Szehon Ho (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966355#comment-13966355 ] Szehon Ho commented on HIVE-6785: --------------------------------- Hi Tonjie, these are deprecated now and will be removed. See the discussion on HIVE-6757, for the current state. Use 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe', 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat', 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe > -------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-6785 > URL: https://issues.apache.org/jira/browse/HIVE-6785 > Project: Hive > Issue Type: Bug > Components: File Formats, Serializers/Deserializers > Affects Versions: 0.13.0 > Reporter: Tongjie Chen > Fix For: 0.14.0 > > Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt > > > When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: > "Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector" > This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. > see also in the following parquet issue: > https://github.com/Parquet/parquet-mr/issues/324 > To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. > Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)