Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Wed, 28 Jan 2015 23:15:35 +0000 (UTC)
From: =?utf-8?Q?Sergio_Pe=C3=B1a_=28JIRA=29?= <jira@apache.org>
To: hive-dev@hadoop.apache.org
Message-ID: <JIRA.12770818.1422483677000.199533.1422486935214@Atlassian.JIRA>
In-Reply-To: <JIRA.12770818.1422483677000@Atlassian.JIRA>
References: <JIRA.12770818.1422483677000@Atlassian.JIRA>
 <JIRA.12770818.1422483677720@arcas>
Subject: [jira] [Updated] (HIVE-9502) Parquet cannot read Map types from
 files written with Hive <= 0.12
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


     [ https://issues.apache.org/jira/browse/HIVE-9502?page=3Dcom.atlassian=
.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergio Pe=C3=B1a updated HIVE-9502:
------------------------------
    Attachment: alltypesparquet

> Parquet cannot read Map types from files written with Hive <=3D 0.12
> ------------------------------------------------------------------
>
>                 Key: HIVE-9502
>                 URL: https://issues.apache.org/jira/browse/HIVE-9502
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: Sergio Pe=C3=B1a
>            Assignee: Sergio Pe=C3=B1a
>         Attachments: HIVE-9502.1.patch, HIVE-9502.2.patch, HIVE-9502.3.pa=
tch, alltypesparquet
>
>
> When reading a Parquet file written by Hive <=3D 0.12, the following erro=
r is thrown:
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
>         at org.apache.hadoop.hive.ql.io.parquet.serde.AbstractParquetMapI=
nspector.getMap(AbstractParquetMapInspector.java:73)
>         at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(L=
azySimpleSerDe.java:519)
>         at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeFi=
eld(LazySimpleSerDe.java:443)
>         at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(L=
azySimpleSerDe.java:427)
>         at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(File=
SinkOperator.java:582)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:=
796)
>         at org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOp=
erator.java:51)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:=
796)
>         at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(Select=
Operator.java:87)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:=
796)
>         at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(Tab=
leScanOperator.java:92)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:=
796)
>         at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator=
.java:539)
>         ... 9 more
> {noformat}
> This is because old versions of Hive (<=3D 0.12) write Map types using th=
e following schema:
> {noformat}
> optional group m1 (MAP_KEY_VALUE) {
> =09repeated group map {
> =09=09required binary key;
> =09=09optional binary key;
> =09}
> }=09
> {noformat}
> PARQUET-113 mentions new annotations for Parquet nested types.=20
> https://github.com/rdblue/incubator-parquet-format/blob/PARQUET-113-add-l=
ist-and-map-spec/LogicalTypes.md#maps
> And now the correct schema is:
> {noformat}
> optional group m1f (MAP) {
> =09repeated group map (MAP_KEY_VALUE) {
> =09=09required binary key;
> =09=09optional binary key;
> =09}
> }
> {noformat}
> We should be backwards compatible to the old schema as well.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)