hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Blue (JIRA)" <>
Subject [jira] [Commented] (HIVE-7850) Hive Query failed if the data type is array<string> with parquet files
Date Tue, 26 Aug 2014 16:39:58 GMT


Ryan Blue commented on HIVE-7850:

The array fix is something we need to do on the parquet-avro module. We know it's not allowing
null elements, but Hive was so that's why I mentioned it. Whether or not a null element is
allowed depends on the repetition of the "array_element" field. If it is repeated, then it
doesn't allow null. But the element inside the LIST has to be repeated, so to get a nullable
type you have to create a new group, "array_element" with one element that is optional (and
then name the repeated type "bag"). The easy way to support non-null and nullable array elements
is to switch the "array_element" field between required and optional. But, I don't think we
need to support non-null array elements.

If Hive has a {{array<string>}} type, are the element nullable? If they are, then we
don't need to support the other case.

> Hive Query failed if the data type is array<string> with parquet files
> ----------------------------------------------------------------------
>                 Key: HIVE-7850
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.14.0, 0.13.1
>            Reporter: Sathish
>            Assignee: Sathish
>              Labels: parquet, serde
>             Fix For: 0.14.0
>         Attachments: HIVE-7850.1.patch, HIVE-7850.2.patch, HIVE-7850.patch
> * Created a parquet file from the Avro file which have 1 array data type and rest are
primitive types. Avro Schema of the array data type. Eg: 
> {code}
> { "name" : "action", "type" : [ { "type" : "array", "items" : "string" }, "null" ] }
> {code}
> * Created External Hive table with the Array type as below, 
> {code}
> create external table paraArray (action Array) partitioned by (partitionid int) row format
serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat'
outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
> alter table paraArray add partition(partitionid=1) location '/testPara';
> {code}
> * Run the following query(select action from paraArray limit 10) and the Map reduce jobs
are failing with the following exception.
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable
cannot be cast to
> at parquet.hive.serde.ParquetHiveArrayInspector.getList(
> at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(
> at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(
> at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(
> at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(
> at
> at
> at org.apache.hadoop.mapred.MapTask.runOldMapper(
> at
> at org.apache.hadoop.mapred.Child$
> at Method)
> at
> at
> at org.apache.hadoop.mapred.Child.main(
> ]
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(
> at
> ... 8 more
> {code}
> This issue has long back posted on Parquet issues list and Since this is related to Parquet
Hive serde, I have created the Hive issue here, The details and history of this information
are as shown in the link here

This message was sent by Atlassian JIRA

View raw message