hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Weeks (JIRA)" <>
Subject [jira] [Commented] (HIVE-7800) Parqet Column Index Access Schema Size Checking
Date Sat, 23 Aug 2014 23:44:11 GMT


Daniel Weeks commented on HIVE-7800:

The previous implementation had an issue that is only triggered in rare cases where the first
split of a task does not contain a row group.  This forces the initialization of the value
of the input format to be the size of the table (ArrayWritable), but the next row group will
produce a value only as wide the columns available in the file.

The new patch pads the resolved schema to ensure a matching size and masks the name of the
column so there no possibility of conflict with named columns within the file.

> Parqet Column Index Access Schema Size Checking
> -----------------------------------------------
>                 Key: HIVE-7800
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: Daniel Weeks
>            Assignee: Daniel Weeks
>         Attachments: HIVE-7800.1.patch, HIVE-7800.2.patch
> In the case that a parquet formatted table has partitions where the files have different
size schema, using column index access can result in an index out of bounds exception.

This message was sent by Atlassian JIRA

View raw message