hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dana Ram Meghwal <dana...@saavn.com>
Subject Re: Hive Serialization issues
Date Wed, 23 Nov 2016 09:53:01 GMT
Hey,
Any leads?

On Tue, Nov 22, 2016 at 5:35 PM, Dana Ram Meghwal <danaram@saavn.com> wrote:

> Hey All,
>
> I am using Hive 2.0 with external meta-store on EMR-5.0.0 and TEZ as
> execution engine.
> Our data are stored in json format so for serialization and
> deserialization purpose we are planning to use lazy serde
> (classname is  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' ).
>
> My table definition is
>
> CREATE EXTERNAL TABLE IF NOT EXISTS daily_active_users_summary_json_partition_dt_paths_v1
> (uid string, city string, user string, songcount string, songid_list
> array<string>  ) PARTITIONED BY ( dt string)
>
>  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>
>  WITH SERDEPROPERTIES ('paths'='uid,city,user,songcount,songid_list')
>
>  LOCATION 's3://<bucketname removed>/users/daily_active_
> users_summary_json_partition_dt';
>
>
> and data look like this---
>
> {"uid":"xxxxxxyyyy","listening_user_flag":"non_listening","platform":"android","model":"micromax
> a110q","aquisition_channel":"organic","state":"delhi","app_
> version":"3.2:","country":"IN","city":"new delhi","new_listening_user_
> flag":"non_listening","manufacturer":"Micromax","
> login_mode":"loggedout","new_user_flag":"returning","digital_channel":"Not
> Source"}
>
>
> Note: I have pasted here one record in table.
>
>
> Now, When I do query
>
> select * from daily_active_users_summary_json_partition_dt_paths_v1 limit
> 5;
>
>
> the first field of table takes the complete record and rest of field are
> showing to be NULL.
>
> When I use different serde  'org.apache.hive.hcatalog.data.JsonSerDe'
>
> then I can see the above query works fine and able to serialize data
> perfectly fine. We want to user the lazy serde because our data contains
> non-utf-8 character and the later serde does not support non-utf-8
> character serialization/deserialization.
>
>
> Can you please help me solve this, we mostly want to use lazy serde only
> as we have already experimented with other serde's none of them is working
> for us Is there any configuration which enable
> serialization/deserialization while using lazy Serde.
>
> Or is there any other serde which can fine process non-utf-8 character in
> hive-2 and tez.
>
> Thank you
>
>
> Best Regards,
> Dana Ram Meghwal
> Software Engineer
> danaram@saavn.com
>
>


-- 
Dana Ram Meghwal
Software Engineer
danaram@saavn.com

Mime
View raw message