hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chengxiang Li (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-7142) Hive multi serialization encoding support
Date Wed, 06 Aug 2014 08:47:12 GMT

     [ https://issues.apache.org/jira/browse/HIVE-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chengxiang Li updated HIVE-7142:
--------------------------------

    Description: 
Currently Hive only support serialize data into UTF-8 charset bytes or deserialize from UTF-8
bytes, real world users may want to load different kinds of encoded data into hive directly.
This jira is dedicated to support serialize/deserialize all kinds of encoded data in SerDe
layer. 

For user, only need to configure serialization encoding on table level by set serialization
encoding through serde parameter, for example:

{code:sql}
CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES("serialization.encoding"='GBK');
{code}

or

{code:sql}
ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); 
{code}

LIMITATIONS: Only LazySimpleSerDe support "serialization.encoding" property in this patch.

  was:
Hive only support serialize/deserialize in UTF-8, real world users want to load different
kinds of encoded data into hive directly. For many PRC customers, they would like to load
GBK encoded data.
We support config serialization encoding on table level by set serialization encoding through
serde parameter, for example:
{noformat}
alter table test set serdeproperties ('serialization.encoding'='GBK'); 
{noformat}

LIMITATIONS: Only LazySimpleSerDe support "serialization.encoding" property in this patch.


> Hive multi serialization encoding support
> -----------------------------------------
>
>                 Key: HIVE-7142
>                 URL: https://issues.apache.org/jira/browse/HIVE-7142
>             Project: Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>         Attachments: HIVE-7142.1.patch.txt, HIVE-7142.2.patch
>
>
> Currently Hive only support serialize data into UTF-8 charset bytes or deserialize from
UTF-8 bytes, real world users may want to load different kinds of encoded data into hive directly.
This jira is dedicated to support serialize/deserialize all kinds of encoded data in SerDe
layer. 
> For user, only need to configure serialization encoding on table level by set serialization
encoding through serde parameter, for example:
> {code:sql}
> CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES("serialization.encoding"='GBK');
> {code}
> or
> {code:sql}
> ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); 
> {code}
> LIMITATIONS: Only LazySimpleSerDe support "serialization.encoding" property in this patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message