hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Blue (JIRA)" <>
Subject [jira] [Updated] (HIVE-8909) Hive doesn't correctly read Parquet nested types
Date Wed, 19 Nov 2014 01:03:51 GMT


Ryan Blue updated HIVE-8909:
    Attachment: HIVE-8909-1.patch

This patch implements the rules from PARQUET-113, which required some restructuring of the
existing converters. The included TestArrayCompatibility tests will run on trunk and can be
used to verify that the current array representation has not been changed and to see the current
behavior for Avro, Thrift, and repeated types without annotations.

This patch has the following behavior consequences:
1. Avro and Thrift data structures that could be read previously will match the original Avro
or Thrift type. This is the case when Avro stored, for example, a {{array<struct<f1:
int>>}}. This structure matched Hive's 3-level representation of arrays, so it could
be read, although the inner Avro record level was discarded by the SerDe and the type in Hive
would be {{array<int>}}.
2. Lists must be annotated with {{LIST}} and maps with {{MAP}}. This was assumed by the previous
version. This is a safe change because all Parquet object models have correctly used these
3. Repeated groups with 3 or more fields and repeated primitive types are now supported.

The Hive SerDe expects an extra {{ArrayWritable}} layer from the Parquet {{Converter}}. This
expectation has been preserved and all list and map structures artificially include it so
that the SerDe doesn't need to be changed. This should be done as a follow-up issue.

> Hive doesn't correctly read Parquet nested types
> ------------------------------------------------
>                 Key: HIVE-8909
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Ryan Blue
>            Assignee: Ryan Blue
>         Attachments: HIVE-8909-1.patch
> Parquet's Avro and Thrift object models don't produce the same parquet type representation
for lists and maps that Hive does. In the Parquet community, we've defined what should be
written and backward-compatibility rules for existing data written by parquet-avro and parquet-thrift
in PARQUET-113. We need to implement those rules in the Hive Converter classes.

This message was sent by Atlassian JIRA

View raw message