Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 266661755C for ; Fri, 30 Jan 2015 20:50:35 +0000 (UTC) Received: (qmail 57979 invoked by uid 500); 30 Jan 2015 20:50:35 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 57902 invoked by uid 500); 30 Jan 2015 20:50:35 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 57888 invoked by uid 500); 30 Jan 2015 20:50:35 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 57872 invoked by uid 99); 30 Jan 2015 20:50:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Jan 2015 20:50:35 +0000 Date: Fri, 30 Jan 2015 20:50:35 +0000 (UTC) From: "Brock Noland (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-9502) Parquet cannot read Map types from files written with Hive <= 0.12 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-9502?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D14299= 208#comment-14299208 ]=20 Brock Noland commented on HIVE-9502: ------------------------------------ Thank you Sergio! I have committed this to trunk and branch-1.1. > Parquet cannot read Map types from files written with Hive <=3D 0.12 > ------------------------------------------------------------------ > > Key: HIVE-9502 > URL: https://issues.apache.org/jira/browse/HIVE-9502 > Project: Hive > Issue Type: Bug > Affects Versions: 0.14.0 > Reporter: Sergio Pe=C3=B1a > Assignee: Sergio Pe=C3=B1a > Fix For: 1.1.0 > > Attachments: HIVE-9502.1.patch, HIVE-9502.2.patch, HIVE-9502.3.pa= tch, HIVE-9502.4.patch, alltypesparquet > > > When reading a Parquet file written by Hive <=3D 0.12, the following erro= r is thrown: > {noformat} > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 > at org.apache.hadoop.hive.ql.io.parquet.serde.AbstractParquetMapI= nspector.getMap(AbstractParquetMapInspector.java:73) > at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(L= azySimpleSerDe.java:519) > at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeFi= eld(LazySimpleSerDe.java:443) > at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(L= azySimpleSerDe.java:427) > at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(File= SinkOperator.java:582) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:= 796) > at org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOp= erator.java:51) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:= 796) > at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(Select= Operator.java:87) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:= 796) > at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(Tab= leScanOperator.java:92) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:= 796) > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator= .java:539) > ... 9 more > {noformat} > This is because old versions of Hive (<=3D 0.12) write Map types using th= e following schema: > {noformat} > optional group m1 (MAP_KEY_VALUE) { > =09repeated group map { > =09=09required binary key; > =09=09optional binary key; > =09} > }=09 > {noformat} > PARQUET-113 mentions new annotations for Parquet nested types.=20 > https://github.com/rdblue/incubator-parquet-format/blob/PARQUET-113-add-l= ist-and-map-spec/LogicalTypes.md#maps > And now the correct schema is: > {noformat} > optional group m1f (MAP) { > =09repeated group map (MAP_KEY_VALUE) { > =09=09required binary key; > =09=09optional binary key; > =09} > } > {noformat} > We should be backwards compatible to the old schema as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)