hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brock Noland (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-8909) Hive doesn't correctly read Parquet nested types
Date Fri, 21 Nov 2014 03:56:34 GMT

    [ https://issues.apache.org/jira/browse/HIVE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220487#comment-14220487
] 

Brock Noland commented on HIVE-8909:
------------------------------------

Not sure which test this is from:

{noformat}
Caused by: parquet.io.ParquetDecodingException: Can not read value at 0 in block 0 in file
pfile:/Users/noland/workspaces/hive-apache/hive/itests/qtest/target/warehouse/parquet_jointable2/000000_0
  at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:213)
  at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:204)
  at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:102)
  at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:71)
  at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:71)
  at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
  ... 16 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
  at org.apache.hadoop.hive.ql.io.parquet.convert.HiveStructConverter.set(HiveStructConverter.java:96)
  at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$BinaryConverter.addBinary(ETypeConverter.java:219)
  at parquet.column.impl.ColumnReaderImpl$2$6.writeValue(ColumnReaderImpl.java:306)
  at parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:353)
  at parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:402)
  at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:194)
  ... 21 more
{noformat}

All failed parquet tests with the patch:
{noformat}
  <testcase name="testCliDriver_parquet_array_null_element" classname="org.apache.hadoop.hive.cli.TestCliDriver"
time="4.945">
  <testcase name="testCliDriver_parquet_create" classname="org.apache.hadoop.hive.cli.TestCliDriver"
time="4.416">
  <testcase name="testCliDriver_parquet_decimal" classname="org.apache.hadoop.hive.cli.TestCliDriver"
time="5.478">
  <testcase name="testCliDriver_parquet_join" classname="org.apache.hadoop.hive.cli.TestCliDriver"
time="8.928">
  <testcase name="testCliDriver_parquet_types" classname="org.apache.hadoop.hive.cli.TestCliDriver"
time="4.094">
{noformat}

> Hive doesn't correctly read Parquet nested types
> ------------------------------------------------
>
>                 Key: HIVE-8909
>                 URL: https://issues.apache.org/jira/browse/HIVE-8909
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.13.1
>            Reporter: Ryan Blue
>            Assignee: Ryan Blue
>         Attachments: HIVE-8909-1.patch, HIVE-8909-2.patch, HIVE-8909.2.patch, HIVE-8909.3.patch,
parquet-test-data.tar.gz
>
>
> Parquet's Avro and Thrift object models don't produce the same parquet type representation
for lists and maps that Hive does. In the Parquet community, we've defined what should be
written and backward-compatibility rules for existing data written by parquet-avro and parquet-thrift
in PARQUET-113. We need to implement those rules in the Hive Converter classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message