hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergio Peña (JIRA) <j...@apache.org>
Subject [jira] [Updated] (HIVE-9502) Parquet cannot read Map types from files written with Hive <= 0.12
Date Thu, 29 Jan 2015 20:30:35 GMT

     [ https://issues.apache.org/jira/browse/HIVE-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergio Peña updated HIVE-9502:
------------------------------
    Attachment: HIVE-9502.4.patch

> Parquet cannot read Map types from files written with Hive <= 0.12
> ------------------------------------------------------------------
>
>                 Key: HIVE-9502
>                 URL: https://issues.apache.org/jira/browse/HIVE-9502
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>         Attachments: HIVE-9502.1.patch, HIVE-9502.2.patch, HIVE-9502.3.patch, HIVE-9502.4.patch,
alltypesparquet
>
>
> When reading a Parquet file written by Hive <= 0.12, the following error is thrown:
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
>         at org.apache.hadoop.hive.ql.io.parquet.serde.AbstractParquetMapInspector.getMap(AbstractParquetMapInspector.java:73)
>         at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:519)
>         at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:443)
>         at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:427)
>         at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:582)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
>         at org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
>         at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
>         at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
>         at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539)
>         ... 9 more
> {noformat}
> This is because old versions of Hive (<= 0.12) write Map types using the following
schema:
> {noformat}
> optional group m1 (MAP_KEY_VALUE) {
> 	repeated group map {
> 		required binary key;
> 		optional binary key;
> 	}
> }	
> {noformat}
> PARQUET-113 mentions new annotations for Parquet nested types. 
> https://github.com/rdblue/incubator-parquet-format/blob/PARQUET-113-add-list-and-map-spec/LogicalTypes.md#maps
> And now the correct schema is:
> {noformat}
> optional group m1f (MAP) {
> 	repeated group map (MAP_KEY_VALUE) {
> 		required binary key;
> 		optional binary key;
> 	}
> }
> {noformat}
> We should be backwards compatible to the old schema as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message