hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sathish (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7850) Hive Query failed if the data type is array<string> with parquet files
Date Wed, 27 Aug 2014 17:43:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14112528#comment-14112528
] 

Sathish commented on HIVE-7850:
-------------------------------

Thanks Ryan,
Based on your comments it looks like no particular changed needed in the Hive serde side for
handling Non nullable arrays and Parquet Avro library needs to be fixed to convert the schema
format properly.
I am planning to work on fixing the parquet Avro library for properly generating the parquet
files with schema understandable by the Hive, I will update my findings once my changes are
done.

> Hive Query failed if the data type is array<string> with parquet files
> ----------------------------------------------------------------------
>
>                 Key: HIVE-7850
>                 URL: https://issues.apache.org/jira/browse/HIVE-7850
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.14.0, 0.13.1
>            Reporter: Sathish
>            Assignee: Sathish
>              Labels: parquet, serde
>             Fix For: 0.14.0
>
>         Attachments: HIVE-7850.1.patch, HIVE-7850.2.patch, HIVE-7850.patch
>
>
> * Created a parquet file from the Avro file which have 1 array data type and rest are
primitive types. Avro Schema of the array data type. Eg: 
> {code}
> { "name" : "action", "type" : [ { "type" : "array", "items" : "string" }, "null" ] }
> {code}
> * Created External Hive table with the Array type as below, 
> {code}
> create external table paraArray (action Array) partitioned by (partitionid int) row format
serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat'
outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
> alter table paraArray add partition(partitionid=1) location '/testPara';
> {code}
> * Run the following query(select action from paraArray limit 10) and the Map reduce jobs
are failing with the following exception.
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable
cannot be cast to org.apache.hadoop.io.ArrayWritable
> at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
> at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
> at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
> at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
> at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> ]
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
> ... 8 more
> {code}
> This issue has long back posted on Parquet issues list and Since this is related to Parquet
Hive serde, I have created the Hive issue here, The details and history of this information
are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message