hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Krishna Kumar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-956) Add support of columnar binary serde
Date Thu, 09 Jun 2011 07:02:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046362#comment-13046362
] 

Krishna Kumar commented on HIVE-956:
------------------------------------

bq. can warnedOnceNullMapKey be removed?

It is easy to remove warnedOnceNullMapKey
 - if it is ok to log a warning message every time a null map key is encountered
 - if it is ok to log a warning message only once per process execution (by making it a class
static)

The current behavior is to log a warning message once per instance of LazyBinarySerde. If
we want to retain the same behavior, it should either be a parameter? (or more complicated
mechanisms as callback/thread-local)

bq. A 0 should mean an empty string. '\N' means null in Hive. Can you take a look at how LazyBinarySerde
handles null value, and do the same thing here.

Not sure I understand. The serde is free to implement the mechanism to encode null/empty values
anyway it sees fit? '\N' means null only in the context of specific serde - for instance columnar
serde. Lazybinaryserde uses a null byte for every 8 fields to encode nulls, (and a string
length as part of the data for encoding empty strings). IMO, neither of these options is best
suited for lazybinarycolumnar, the former as it means escaping complexities, and the latter
as the storage is now by columns, not by rows. I have taken the approach that a 0-length column
cell value indicates nulls (nulls being a very common case, should have minimal overheads.).
For empty strings, while the option to encode string length as part of the cell value is still
an option, I think that is too much overhead (as shown in my tests for the same specific dataset)
for the non-empty cells. 

The implementation is fine, I think. It first checks whether the field is a primitive (for
non-primitives, input byte stream length is also the data length), and then on the field is
a string of length 1 with the value being the special marker etc.

will do the mapequalcomparer splitting.


> Add support of columnar binary serde
> ------------------------------------
>
>                 Key: HIVE-956
>                 URL: https://issues.apache.org/jira/browse/HIVE-956
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>            Assignee: Krishna Kumar
>         Attachments: HIVE.956.patch.0, HIVE.956.patch.1, HIVE.956.patch.2
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message