hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghotham Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-337) LazySimpleSerDe should support array and map types
Date Wed, 11 Mar 2009 22:42:50 GMT

    [ https://issues.apache.org/jira/browse/HIVE-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681073#action_12681073

Raghotham Murthy commented on HIVE-337:

I see that the trend is to keep writing new SerDes either for performance (MetadataTyped...
and possibly others) or for expressibility (DynamicSerDe etc). Eventually, most of these SerDes
are not used at all other than for backward compatibility with existing data. Isnt it better
to find a balance between performance and and expressibility in a single serde which can be
used in general. Of course, if people want more performance/expressibility they can use/write
other serdes which use binary formats etc. 

Using array begin and end markers dont decrease human-readability imo (people are fine with
reading json right?) and they allow expressing nested structures. I am not sure there is a
disadvantage to this. What I was hoping was that LazySimpleSerDe becomes the default SerDe
for most requirements. 

Specifically, I have a requirement for arrays of maps. If you dont provide support for that
in LazySimpleSerDe (which is probably not a big change, i might be mistaken though), then
for my requirement, we would have to go through the process of creating a new SerDe. And once
we create that serde I'd rather use it than LazySimpleSerDe for all of my future requirements.
I am guessing that pretty soon we would have to deprecate LazySimpleSerDe in favor of this
new serde because of its expressibility.

Regarding automatically detecting the serialization format for arrays in the data, maybe I
am mistaken, but arent you already using some logic to create LazySimpleSerDe when the metastore
has MetadataTypedColumnSetSerDe for that table? In that same logic, cant you add a parameter
to the lazy serde to indicate which array serialization format to use?

Again, I am not suggesting that we should add several serialization formats to the same SerDe.
All I am suggesting is that there is a middle ground between proliferating class for each
small feature difference and putting all features into a single class.

> LazySimpleSerDe should support array and map types
> --------------------------------------------------
>                 Key: HIVE-337
>                 URL: https://issues.apache.org/jira/browse/HIVE-337
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.2.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Blocker
> Once we do that, we can completely deprecate DynamicSerDe/TCTLSeparatedProtocol, and
close any bugs that DynamicSerDe/TCTLSeparatedProtocol has.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message