hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lefty Leverenz (JIRA)" <>
Subject [jira] [Commented] (HIVE-7142) Hive multi serialization encoding support
Date Tue, 11 Nov 2014 02:46:34 GMT


Lefty Leverenz commented on HIVE-7142:

[~chengxiang li], did you document this in the wiki yet?  If so, we can remove the TODOC14

If not, suggested doc locations are listed in a [previous comment|].

> Hive multi serialization encoding support
> -----------------------------------------
>                 Key: HIVE-7142
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>              Labels: TODOC14
>             Fix For: 0.14.0
>         Attachments: HIVE-7142.1.patch.txt, HIVE-7142.2.patch, HIVE-7142.3.patch, HIVE-7142.4.patch
> Currently Hive only support serialize data into UTF-8 charset bytes or deserialize from
UTF-8 bytes, real world users may want to load different kinds of encoded data into hive directly.
This jira is dedicated to support serialize/deserialize all kinds of encoded data in SerDe
> For user, only need to configure serialization encoding on table level by set serialization
encoding through serde parameter, for example:
> {code:sql}
> CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES("serialization.encoding"='GBK');
> {code}
> or
> {code:sql}
> ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); 
> {code}
> LIMITATIONS: Only LazySimpleSerDe support "serialization.encoding" property in this patch.

This message was sent by Atlassian JIRA

View raw message