hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ferdinand Xu (JIRA)" <>
Subject [jira] [Commented] (HIVE-10086) Hive throws error when accessing Parquet file schema using field name match
Date Mon, 30 Mar 2015 03:40:52 GMT


Ferdinand Xu commented on HIVE-10086:

Hi [], I think git repo is not sync up with svn repo. You can get this commit information
from SVN. I just tried. :)

> Hive throws error when accessing Parquet file schema using field name match
> ---------------------------------------------------------------------------
>                 Key: HIVE-10086
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>             Fix For: 1.2.0
>         Attachments: HIVE-10086.5.patch, HiveGroup.parquet
> When Hive table schema contains a portion of the schema of a Parquet file, then the access
to the values should work if the field names match the schema. This does not work when a struct<>
data type is in the schema, and the Hive schema contains just a portion of the struct elements.
Hive throws an error instead.
> This is the example and how to reproduce:
> First, create a parquet table, and add some values on it:
> {code}
> CREATE TABLE test1 (id int, name string, address struct<number:int,street:string,zip:string>)
> INSERT INTO TABLE test1 SELECT 1, 'Roger', named_struct('number',8600,'street','Congress
Ave.','zip','87366') FROM srcpart LIMIT 1;
> {code}
> Note: {{srcpart}} could be any table. It is just used to leverage the INSERT statement.
> The above table example generates the following Parquet file schema:
> {code}
> message hive_schema {
>   optional int32 id;
>   optional binary name (UTF8);
>   optional group address {
>     optional int32 number;
>     optional binary street (UTF8);
>     optional binary zip (UTF8);
>   }
> }
> {code} 
> Afterwards, I create a table that contains just a portion of the schema, and load the
Parquet file generated above, a query will fail on that table:
> {code}
> CREATE TABLE test1 (name string, address struct<street:string>) STORED AS PARQUET;
> hive> SELECT name FROM test1;
> OK
> Roger
> Time taken: 0.071 seconds, Fetched: 1 row(s)
> hive> SELECT address FROM test1;
> OK
> Failed with exception
java.lang.UnsupportedOperationException: Cannot inspect
> Time taken: 0.085 seconds
> {code}
> I would expect that Parquet can access the matched names, but Hive throws an error instead.

This message was sent by Atlassian JIRA

View raw message