hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Chang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-2333) LazySimpleSerDe does not properly handle arrays / escape control characters
Date Tue, 02 Aug 2011 19:30:27 GMT
LazySimpleSerDe does not properly handle arrays / escape control characters
---------------------------------------------------------------------------

                 Key: HIVE-2333
                 URL: https://issues.apache.org/jira/browse/HIVE-2333
             Project: Hive
          Issue Type: Bug
            Reporter: Jonathan Chang


LazySimpleSerDe, the default SerDe for Hive is severely broken:

* Empty arrays are serialized as an empty string. Hence an array(array()) is indistinguishable
from array(array(array())) from array().
* Similarly, empty strings are serialized as an empty string. Hence array('') is also indistinguishable
from an empty array.
* if the serialized string equals the null sequence, then it is ambiguous as to whether it
is an array with a single null element or a null array.

It also does not do well with control characters:

> select array('foo\002bar') from tmp;
...
["foo","bar"]

> select array('foo\001bar') from tmp;
...
["foo"]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message