hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vaibhav Gumashta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15082) Hive-1.2 cannot read data from complex data types with TIMESTAMP column, stored in Parquet
Date Mon, 20 Mar 2017 23:31:42 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933792#comment-15933792
] 

Vaibhav Gumashta commented on HIVE-15082:
-----------------------------------------

Removing target 1.2.2 and moving to 1.3.0. Please feel free to revert if you think this should
go in 1.2.2 (or if this gets reviewed before RC for 1.2.2 is cut).

> Hive-1.2 cannot read data from complex data types with TIMESTAMP column, stored in Parquet
> ------------------------------------------------------------------------------------------
>
>                 Key: HIVE-15082
>                 URL: https://issues.apache.org/jira/browse/HIVE-15082
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Oleksiy Sayankin
>            Assignee: Oleksiy Sayankin
>            Priority: Blocker
>         Attachments: HIVE-15082.1-branch-1.2.patch, HIVE-15082.1-branch-1.2.patch, HIVE-15082-branch-1.2.patch,
HIVE-15082-branch-1.patch
>
>
> *STEP 1. Create test data*
> {code:sql}
> select * from dual;
> {code}
> *EXPECTED RESULT:*
> {noformat}
> Pretty_UnIQUe_StrinG
> {noformat}
> {code:sql}
> create table test_parquet1(login timestamp) stored as parquet;
> insert overwrite table test_parquet1 select from_unixtime(unix_timestamp()) from dual;
> select * from test_parquet1 limit 1;
> {code}
> *EXPECTED RESULT:*
> No exceptions. Current timestamp as result.
> {noformat}
> 2016-10-27 10:58:19
> {noformat}
> *STEP 2. Store timestamp in array in parquet file*
> {code:sql}
> create table test_parquet2(x array<timestamp>) stored as parquet;
> insert overwrite table test_parquet2 select array(login) from test_parquet1;
> select * from test_parquet2;
> {code}
> *EXPECTED RESULT:*
> No exceptions. Current timestamp in brackets as result.
> {noformat}
> ["2016-10-27 10:58:19"]
> {noformat}
> *ACTUAL RESULT:*
> {noformat}
> ERROR [main]: CliDriver (SessionState.java:printError(963)) - Failed with exception java.io.IOException:parquet.io.ParquetDecodingException:
Can not read value at 0 in block -1 in file hdfs:///user/hive/warehouse/test_parquet2/000000_0
> java.io.IOException: parquet.io.ParquetDecodingException: Can not read value at 0 in
block -1 in file hdfs:///user/hive/warehouse/test_parquet2/000000_0
> {noformat}
> *ROOT-CAUSE:*
> Incorrect initialization of {{metadata}} {{HashMap}} causes that it has {{null}} value
in enumeration {{org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter}} when executing
following line:
> {code:java}
>   boolean skipConversion = Boolean.valueOf(metadata.get(HiveConf.ConfVars.HIVE_PARQUET_TIMESTAMP_SKIP_CONVERSION.varname));
> {code}
> in element {{ETIMESTAMP_CONVERTER}}.
> JVM throws NPE and parquet library can not read data from file and throws 
> {noformat}
> java.io.IOException:parquet.io.ParquetDecodingException: Can not read value at 0 in block
-1 in file hdfs:///user/hive/warehouse/test_parquet2/000000_0
> {noformat}
> for its turn.
> *SOLUTION:*
> Perform initialization in separate method to skip overriding it with {{null}} value in
block of code
> {code:java}
>   if (parent != null) {
>      setMetadata(parent.getMetadata());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message