hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergio Peña (JIRA) <j...@apache.org>
Subject [jira] [Updated] (HIVE-10086) Hive throws error when accessing Parquet file schema using field name match
Date Thu, 26 Mar 2015 15:59:53 GMT

     [ https://issues.apache.org/jira/browse/HIVE-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergio Peña updated HIVE-10086:
-------------------------------
    Attachment: HIVE-10086.2.patch

> Hive throws error when accessing Parquet file schema using field name match
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-10086
>                 URL: https://issues.apache.org/jira/browse/HIVE-10086
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>         Attachments: HIVE-10086.2.patch, HiveGroup.parquet
>
>
> When Hive table schema contains a portion of the schema of a Parquet file, then the access
to the values should work if the field names match the schema. This does not work when a struct<>
data type is in the schema, and the Hive schema contains just a portion of the struct elements.
Hive throws an error instead.
> This is the example and how to reproduce:
> First, create a parquet table, and add some values on it:
> {code}
> CREATE TABLE test1 (id int, name string, address struct<number:int,street:string,zip:string>)
STORED AS PARQUET;
> INSERT INTO TABLE test1 SELECT 1, 'Roger', named_struct('number',8600,'street','Congress
Ave.','zip','87366') FROM srcpart LIMIT 1;
> {code}
> Note: {{srcpart}} could be any table. It is just used to leverage the INSERT statement.
> The above table example generates the following Parquet file schema:
> {code}
> message hive_schema {
>   optional int32 id;
>   optional binary name (UTF8);
>   optional group address {
>     optional int32 number;
>     optional binary street (UTF8);
>     optional binary zip (UTF8);
>   }
> }
> {code} 
> Afterwards, I create a table that contains just a portion of the schema, and load the
Parquet file generated above, a query will fail on that table:
> {code}
> CREATE TABLE test1 (name string, address struct<street:string>) STORED AS PARQUET;
> LOAD DATA LOCAL INPATH '/tmp/HiveGroup.parquet' OVERWRITE INTO TABLE test1;
> hive> SELECT name FROM test1;
> OK
> Roger
> Time taken: 0.071 seconds, Fetched: 1 row(s)
> hive> SELECT address FROM test1;
> OK
> Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.IntWritable
> Time taken: 0.085 seconds
> {code}
> I would expect that Parquet can access the matched names, but Hive throws an error instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message