hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nandor Kollar (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-21987) Hive is unable to read Parquet int32 annotated with decimal
Date Tue, 23 Jul 2019 13:58:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-21987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nandor Kollar updated HIVE-21987:
---------------------------------
    Attachment: part-00000-e5287735-8dcf-4dda-9c6e-4d5c98dc15f2-c000.snappy.parquet

> Hive is unable to read Parquet int32 annotated with decimal
> -----------------------------------------------------------
>
>                 Key: HIVE-21987
>                 URL: https://issues.apache.org/jira/browse/HIVE-21987
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Nandor Kollar
>            Assignee: Marta Kuczora
>            Priority: Major
>         Attachments: part-00000-e5287735-8dcf-4dda-9c6e-4d5c98dc15f2-c000.snappy.parquet
>
>
> When I tried to read a Parquet file from a Hive (with Tez execution engine) table with
a small decimal column, I got the following exception:
> {code}
> Caused by: java.lang.UnsupportedOperationException: org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$1
> 	at org.apache.parquet.io.api.PrimitiveConverter.addInt(PrimitiveConverter.java:98)
> 	at org.apache.parquet.column.impl.ColumnReaderImpl$2$3.writeValue(ColumnReaderImpl.java:248)
> 	at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367)
> 	at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406)
> 	at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226)
> 	... 28 more
> {code}
> Steps to reproduce:
> - Create a Hive table with a single decimal(4, 2) column
> - Create a Parquet file with int32 column annotated with decimal(4, 2) logical type,
put it into the previously created table location (or use the attached parquet file, in this
case the column should be named as 'd', to match the Hive schema with the Parquet schema in
the file)
> - Execute a {{select *}} on this table
> Also, I'm afraid that similar problems can happen with int64 decimals too. [Parquet specification
| https://github.com/apache/parquet-format/blob/master/LogicalTypes.md] allows both of these
cases.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message