hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-5320) Querying a table with nested struct type over JSON data results in errors
Date Mon, 23 Sep 2013 16:26:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774693#comment-13774693
] 

Ashutosh Chauhan commented on HIVE-5320:
----------------------------------------

[~ctang.cloudera] I am not sure if it is easy to detect such a badly behaving serde. This
is not something easily enforceable. So, only thing I can see is to improve on our documentation
so that serde writers are well aware of this behavior. Lets close this one as won't fix and
improve documentation on cwiki.
                
> Querying a table with nested struct type over JSON data results in errors
> -------------------------------------------------------------------------
>
>                 Key: HIVE-5320
>                 URL: https://issues.apache.org/jira/browse/HIVE-5320
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.9.0
>            Reporter: Chaoyu Tang
>            Assignee: Chaoyu Tang
>         Attachments: HIVE-5320.patch
>
>
> Querying a table with nested_struct datatype like
> ==
> create table nest_struct_tbl (col1 string, col2 array<struct<a1:string, a2:array<struct<b1:int,
b2:string, b3:string>>>>) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';

> ==
> over JSON data cause errors including java.lang.IndexOutOfBoundsException or corrupted
data. 
> The JsonSerDe used is json-serde-1.1.4.jar/json-serde-1.1.4-jar-dependencies.jar.
> The cause is that the method:
> public List<Object> getStructFieldsDataAsList(Object o) 
> in JsonStructObjectInspector.java 
> returns a list referencing to a static arraylist "values"
> So the local variable 'list' in method serialize of Hive LazySimpleSerDe class is returned
with same reference in its recursive calls and its element values are kept on being overwritten
in the case STRUCT.
> Solutions:
> 1. Fix in JsonSerDe, and change the field 'values' in java.org.openx.data.jsonserde.objectinspector.JsonStructObjectInspector.java
> to instance scope.
> Filed a ticket to JSonSerDe (https://github.com/rcongiu/Hive-JSON-Serde/issues/31)
> 2. Ideally, in the method serialize of class LazySimpleSerDe, we should defensively save
a copy of a list resulted from list = soi.getStructFieldsDataAsList(obj) in which case the
soi is the instance of JsonStructObjectInspector, so that the recursive calls of serialize
can work properly regardless of the extended SerDe implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message