hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitry Romanenko (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-21987) Hive is unable to read Parquet int32 annotated with decimal
Date Sat, 05 Oct 2019 00:51:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-21987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944940#comment-16944940
] 

Dmitry Romanenko commented on HIVE-21987:
-----------------------------------------

Any chance this will be backported to 3.x tree? This seems like quite major problem affecting
multiple trees.

> Hive is unable to read Parquet int32 annotated with decimal
> -----------------------------------------------------------
>
>                 Key: HIVE-21987
>                 URL: https://issues.apache.org/jira/browse/HIVE-21987
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Nándor Kollár
>            Assignee: Marta Kuczora
>            Priority: Major
>             Fix For: 4.0.0
>
>         Attachments: HIVE-21987.1.patch, HIVE-21987.2.patch, HIVE-21987.3.patch, HIVE-21987.4.patch,
HIVE-21987.5.patch, part-00000-e5287735-8dcf-4dda-9c6e-4d5c98dc15f2-c000.snappy.parquet
>
>
> When I tried to read a Parquet file from a Hive (with Tez execution engine) table with
a small decimal column, I got the following exception:
> {code}
> Caused by: java.lang.UnsupportedOperationException: org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$1
> 	at org.apache.parquet.io.api.PrimitiveConverter.addInt(PrimitiveConverter.java:98)
> 	at org.apache.parquet.column.impl.ColumnReaderImpl$2$3.writeValue(ColumnReaderImpl.java:248)
> 	at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367)
> 	at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406)
> 	at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226)
> 	... 28 more
> {code}
> Steps to reproduce:
> - Create a Hive table with a single decimal(4, 2) column
> - Create a Parquet file with int32 column annotated with decimal(4, 2) logical type,
put it into the previously created table location (or use the attached parquet file, in this
case the column should be named as 'd', to match the Hive schema with the Parquet schema in
the file)
> - Execute a {{select *}} on this table
> Also, I'm afraid that similar problems can happen with int64 decimals too. [Parquet specification
| https://github.com/apache/parquet-format/blob/master/LogicalTypes.md] allows both of these
cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message