hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tongjie Chen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
Date Wed, 02 Apr 2014 17:59:17 GMT

     [ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tongjie Chen updated HIVE-6785:
-------------------------------

    Description: 
When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND
if this table has string column[s], hive generates confusing error message:

"Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector
cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector"

This is confusing because timestamp is mentioned even if it is not been used by the table.
The reason is when there is SerDe difference between table and partition, hive tries to convert
objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector
(newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector,
which ObjectInspectorConverters expect for string category objector inspector. There is no
break statement in STRING case statement, hence the following TIMESTAMP case statement is
executed, generating confusing error message.

see also in the following parquet issue:
https://github.com/Parquet/parquet-mr/issues/324

To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector
instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector
is package scope instead of public or protected, we would need to move ParquetStringInspector
to the same package with JavaStringObjectInspector.

Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since
the corresponding setStructFieldData and create methods return a list. This is also needed
when table SerDe is ParquetHiveSerDe, and partition SerDe is something else.




  was:
More specifically, if table contains string type columns. it will result in the following
exception ""Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector
cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector"

see also in the following parquet issue:
https://github.com/Parquet/parquet-mr/issues/324






> query fails when partitioned table's table level serde is ParquetHiveSerDe and partition
level serde is of different SerDe
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-6785
>                 URL: https://issues.apache.org/jira/browse/HIVE-6785
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats, Serializers/Deserializers
>    Affects Versions: 0.13.0
>            Reporter: Tongjie Chen
>         Attachments: HIVE-6785.1.patch.txt
>
>
> When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe,
AND if this table has string column[s], hive generates confusing error message:
> "Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector
cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector"
> This is confusing because timestamp is mentioned even if it is not been used by the table.
The reason is when there is SerDe difference between table and partition, hive tries to convert
objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector
(newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector,
which ObjectInspectorConverters expect for string category objector inspector. There is no
break statement in STRING case statement, hence the following TIMESTAMP case statement is
executed, generating confusing error message.
> see also in the following parquet issue:
> https://github.com/Parquet/parquet-mr/issues/324
> To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector
instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector
is package scope instead of public or protected, we would need to move ParquetStringInspector
to the same package with JavaStringObjectInspector.
> Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data,
since the corresponding setStructFieldData and create methods return a list. This is also
needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message