Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 00F8911EFE for ; Fri, 22 Aug 2014 18:02:12 +0000 (UTC) Received: (qmail 56829 invoked by uid 500); 22 Aug 2014 18:02:11 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 56754 invoked by uid 500); 22 Aug 2014 18:02:11 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 56739 invoked by uid 500); 22 Aug 2014 18:02:11 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 56736 invoked by uid 99); 22 Aug 2014 18:02:11 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Aug 2014 18:02:11 +0000 Date: Fri, 22 Aug 2014 18:02:11 +0000 (UTC) From: "Szehon Ho (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-7850) Hive Query failed if the data type is array with parquet files MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107200#comment-14107200 ] Szehon Ho commented on HIVE-7850: --------------------------------- Hi Satish, can you please fix the formatting? Indents are 2 spaces (hive code is like that), and put a space after the comma, etc. Otherwise it looks good to me. But granted, I'm not an expert of parquet schema, so my only question is that it compatible with other tools? + [~jcoffey], [~rdblue] for comments (if any). > Hive Query failed if the data type is array with parquet files > ---------------------------------------------------------------------- > > Key: HIVE-7850 > URL: https://issues.apache.org/jira/browse/HIVE-7850 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Affects Versions: 0.14.0, 0.13.1 > Reporter: Sathish > Labels: parquet, serde > Fix For: 0.14.0 > > Attachments: HIVE-7850.patch > > > * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: > {code} > { "name" : "action", "type" : [ { "type" : "array", "items" : "string" }, "null" ] } > {code} > * Created External Hive table with the Array type as below, > {code} > create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; > alter table paraArray add partition(partitionid=1) location '/testPara'; > {code} > * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. > {code} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable > at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) > at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) > at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) > at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) > at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) > at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) > at org.apache.hadoop.mapred.Child$4.run(Child.java:270) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) > at org.apache.hadoop.mapred.Child.main(Child.java:264) > ] > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) > at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) > ... 8 more > {code} > This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281. -- This message was sent by Atlassian JIRA (v6.2#6252)