hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajay Tirpude <tirpudeaj...@gmail.com>
Subject Re: Nested JSON Parsing
Date Sun, 13 Nov 2016 07:10:07 GMT
I got it. We can use Python and Jruby.

On Sun, Nov 13, 2016 at 12:35 PM, Ajay Tirpude <tirpudeajay1@gmail.com>
wrote:

> Hi Satya,
>
> Thanks I have already started checking JSON Serde. Lets see if it works.
> By the way can we write UDFs in Python/Ruby?
>
> Regards,
> Ajay T
>
> On Sun, Nov 13, 2016 at 12:30 PM, Satya Harish Appana <
> satyaharish.appana@gmail.com> wrote:
>
>> You can use these
>>
>> *Json Serde: *https://github.com/rcongiu/Hive-JSON-Serde
>>
>> or else you can write a hive udtf, (Eg: http://beekeeperdata.com/
>> posts/hadoop/2015/07/26/Hive-UDTF-Tutorial.html)
>>
>>
>>
>> On Sun, Nov 13, 2016 at 12:22 PM, Ajay Tirpude <tirpudeajay1@gmail.com>
>> wrote:
>>
>>> Hi Dudu,
>>>
>>> I want to parse my json file and get the desired output in csv file that
>>> I pasted in the output section. Currently I am able to achieve this using
>>> bash(jq command) but that is not an answer for json files that are in TBs.
>>> So I am looking for a solution in PIG or HIVE.
>>>
>>> Regards,
>>> Ajay T
>>>
>>> On Sun, Nov 13, 2016 at 12:10 PM, Markovitz, Dudu <dmarkovitz@paypal.com
>>> > wrote:
>>>
>>>> And your issue/question is?
>>>>
>>>>
>>>>
>>>> *From:* Ajay Tirpude [mailto:tirpudeajay1@gmail.com]
>>>> *Sent:* Sunday, November 13, 2016 4:46 AM
>>>> *To:* user@hive.apache.org
>>>> *Subject:* Nested JSON Parsing
>>>>
>>>>
>>>>
>>>> Dear All,
>>>>
>>>>
>>>>
>>>> I am trying to parse this json file given below and my intention is to
>>>> convert this json file into a csv.
>>>>
>>>>
>>>>
>>>> *{*
>>>>
>>>> *  "devicetype": "SmartPhone",*
>>>>
>>>> *  "uuid": "sg76fdhh7gfxhxfhgxf67x",*
>>>>
>>>> *  "ts": {*
>>>>
>>>> *    "date": "2016-03-23T10:58:34.660Z"*
>>>>
>>>> *  },*
>>>>
>>>> *  "events": [*
>>>>
>>>> *    {*
>>>>
>>>> *      "timestamp": "2016-03-23T10:58:37Z",*
>>>>
>>>> *      "evt": "first",*
>>>>
>>>> *      "ad": "v6v75v88n98778mn",*
>>>>
>>>> *      "tkey": "ngbbc76fbc6fb6fb66fb6",*
>>>>
>>>> *      "mtp": "Wed Mar 23 2016 19:04:22 GMT 0800 (PHT)",*
>>>>
>>>> *      "eventid": "eytuy"*
>>>>
>>>> *    },*
>>>>
>>>> *    {*
>>>>
>>>> *      "timestamp": "2016-03-23T10:58:35Z",*
>>>>
>>>> *      "evt": "second",*
>>>>
>>>> *      "ad": "v6v75v88n98778mn",*
>>>>
>>>> *      "tkey": "ngbbc76fbc6fb6fb66fb6"*
>>>>
>>>> *    },*
>>>>
>>>> *    {*
>>>>
>>>> *      "timestamp": "2016-03-23T10:58:36Z",*
>>>>
>>>> *      "evt": "third",*
>>>>
>>>> *      "ad": "v6v75v88n98778mn",*
>>>>
>>>> *      "tkey": "ngbbc76fbc6fb6fb66fb6"*
>>>>
>>>> *    }*
>>>>
>>>> *  ],*
>>>>
>>>> *  "adid": "v6v75v88n98778mn",*
>>>>
>>>> *  "ad_tz": {*
>>>>
>>>> *    "date": "2016-03-23T10:58:34.660Z"*
>>>>
>>>> *  },*
>>>>
>>>> *  "ua": "Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005
>>>> Build/JSS15J) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile
>>>> Safari/534.30"*
>>>>
>>>> *}*
>>>>
>>>>
>>>>
>>>> There are few conditions that I need to apply before I parse
>>>>
>>>>
>>>>
>>>> 1. I want to get all the fields except timestamp inside events nested
>>>> key.
>>>>
>>>> 2. I want to loop events key for each evt. In above input file there
>>>> are three evts but that would not fixed in the actual input file. There can
>>>> be multiple evts and not just 3.
>>>>
>>>> 3. Not every evt block is similar. You can have different extra field
>>>> in each evt block but we need to extract every key. In case we don't have
>>>> key in one evt then the value should be blank for that env. For example for
>>>> evt: first we have two extra key value pair i.,e, eventid/mtp and these
>>>> value should be blank for other evts. Similarly we can have some key:value
>>>> in other evts as well so that other key:values should be blank in other
>>>> evts.
>>>>
>>>>
>>>>
>>>> At last I want the output to be like this
>>>>
>>>>
>>>>
>>>> devicetype
>>>>
>>>> uuid
>>>>
>>>> ts.date
>>>>
>>>> events.evt
>>>>
>>>> events.ad
>>>>
>>>> events.tkey
>>>>
>>>> events.mtp
>>>>
>>>> events.eventid
>>>>
>>>> adid
>>>>
>>>> ad_tz.date
>>>>
>>>> ua
>>>>
>>>> SmartPhone
>>>>
>>>> sg76fdhh7gfxhxfhgxf67x
>>>>
>>>> 2016-03-23T10:58:34.660Z
>>>>
>>>> first
>>>>
>>>> v6v75v88n98778mn
>>>>
>>>> ngbbc76fbc6fb6fb66fb6
>>>>
>>>> Wed Mar 23 2016 19:04:22 GMT 0800 (PHT)
>>>>
>>>> eytuy
>>>>
>>>> v6v75v88n98778mn
>>>>
>>>> 2016-03-23T10:58:34.660Z
>>>>
>>>> Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 Build/JSS15J)
>>>> AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30
>>>>
>>>> SmartPhone
>>>>
>>>> sg76fdhh7gfxhxfhgxf67x
>>>>
>>>> 2016-03-23T10:58:34.660Z
>>>>
>>>> second
>>>>
>>>> v6v75v88n98778mn
>>>>
>>>> ngbbc76fbc6fb6fb66fb6
>>>>
>>>> v6v75v88n98778mn
>>>>
>>>> 2016-03-23T10:58:34.660Z
>>>>
>>>> Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 Build/JSS15J)
>>>> AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30
>>>>
>>>> SmartPhone
>>>>
>>>> sg76fdhh7gfxhxfhgxf67x
>>>>
>>>> 2016-03-23T10:58:34.660Z
>>>>
>>>> third
>>>>
>>>> v6v75v88n98778mn
>>>>
>>>> ngbbc76fbc6fb6fb66fb6
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> v6v75v88n98778mn
>>>>
>>>> 2016-03-23T10:58:34.660Z
>>>>
>>>> Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 Build/JSS15J)
>>>> AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Ajay T
>>>>
>>>
>>>
>>
>>
>> --
>>
>>
>> Regards,
>> Satya Harish Appana,
>> Software Development Engineer II,
>> Flipkart,Bangalore,
>> Ph:+91-9538797174.
>>
>
>

Mime
View raw message