Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7CB90200BBC for ; Sun, 13 Nov 2016 08:05:50 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 7B363160AF8; Sun, 13 Nov 2016 07:05:50 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2850D160AF2 for ; Sun, 13 Nov 2016 08:05:49 +0100 (CET) Received: (qmail 10682 invoked by uid 500); 13 Nov 2016 07:05:47 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 10668 invoked by uid 99); 13 Nov 2016 07:05:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 13 Nov 2016 07:05:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 4E722C1446 for ; Sun, 13 Nov 2016 07:05:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.929 X-Spam-Level: * X-Spam-Status: No, score=1.929 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id XZYBHKIvatKh for ; Sun, 13 Nov 2016 07:05:45 +0000 (UTC) Received: from mail-it0-f44.google.com (mail-it0-f44.google.com [209.85.214.44]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 4FA985F1C2 for ; Sun, 13 Nov 2016 07:05:44 +0000 (UTC) Received: by mail-it0-f44.google.com with SMTP id u205so52940926itc.0 for ; Sat, 12 Nov 2016 23:05:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=suJjKjbKSx5gjCB1YkCAXWk+9lDgcq1u7CIQ/eTm8xk=; b=GK/HDtRsQUFGUtTWzRvSI12Jbp3B9TTUAmEZofi8Ef7Wro4qv86dPia/5qvewbPK/+ Z5RDcImzGGzfZHmHDWpLlx/AiYaAtrnCxOTjzGQhSUgWxrNhc+G1wc1FPTJpVSAl40qW HCK6ZOKZr0Ma4GMX1FHbJwOukVuxQfXuBTRooyT5CmO+gE+PyD6tmCVDDPcpgvU7r82c QQStIYeiCNCezc9DQQO/yGBrElfZ9oEkDqY2wxfnFyd61H1k31FTBG6bddHyDVDMU6Sk W/F4Sa+krUtEN0+qlAO0Oa/3q0HJyDNsYhoI0EyqFjudEzEjgdHgmR6+b0G7FkIcKKgV 44AA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=suJjKjbKSx5gjCB1YkCAXWk+9lDgcq1u7CIQ/eTm8xk=; b=MWPSyBEgrRQRyPFw1C3lIJFJ12cUZEpso37boVUtRljI/hqHCOid5AR9rSKNOzJ3Ow Mhm9p+3i2bVBCrhmJM3Z3Q0ns90vLmlNVu7/OEOxedMEfRApeS45a8gwDZZXl3RDPB8a hgeTe0NEzIvkSAJzWPYYvbdXedEayeDH8tyCR0u+QjdIyKW5Rw9CDAqiETYGHjzgwCbS kKQq1R1QQRobcLWd52MmjlUY5S659gxqqXNS49A7WkmpCiMmS6ueorDwCaJZ17qGTYEG dDd0UsDyKi3jYl6CoEKEEMADpX+KFeKVJn31zfL/cS5N51QZQLLrNncjPpIpPM5NaKbx RpcQ== X-Gm-Message-State: ABUngvfhCGZiSIF28Q6LvXICyKmacjryM+Oxfg5gi7HiBLav9djQFhwpm8VPYijBDUpexBV/d7j4mB+hZ7v86Q== X-Received: by 10.36.66.212 with SMTP id i203mr2731848itb.80.1479020743048; Sat, 12 Nov 2016 23:05:43 -0800 (PST) MIME-Version: 1.0 Received: by 10.36.40.73 with HTTP; Sat, 12 Nov 2016 23:05:42 -0800 (PST) In-Reply-To: References: From: Ajay Tirpude Date: Sun, 13 Nov 2016 12:35:42 +0530 Message-ID: Subject: Re: Nested JSON Parsing To: user@hive.apache.org Cc: dmarkovitz@paypal.com Content-Type: multipart/alternative; boundary=001a1144a93e01420c0541295a29 archived-at: Sun, 13 Nov 2016 07:05:50 -0000 --001a1144a93e01420c0541295a29 Content-Type: text/plain; charset=UTF-8 Hi Satya, Thanks I have already started checking JSON Serde. Lets see if it works. By the way can we write UDFs in Python/Ruby? Regards, Ajay T On Sun, Nov 13, 2016 at 12:30 PM, Satya Harish Appana < satyaharish.appana@gmail.com> wrote: > You can use these > > *Json Serde: *https://github.com/rcongiu/Hive-JSON-Serde > > or else you can write a hive udtf, (Eg: http://beekeeperdata.com/ > posts/hadoop/2015/07/26/Hive-UDTF-Tutorial.html) > > > > On Sun, Nov 13, 2016 at 12:22 PM, Ajay Tirpude > wrote: > >> Hi Dudu, >> >> I want to parse my json file and get the desired output in csv file that >> I pasted in the output section. Currently I am able to achieve this using >> bash(jq command) but that is not an answer for json files that are in TBs. >> So I am looking for a solution in PIG or HIVE. >> >> Regards, >> Ajay T >> >> On Sun, Nov 13, 2016 at 12:10 PM, Markovitz, Dudu >> wrote: >> >>> And your issue/question is? >>> >>> >>> >>> *From:* Ajay Tirpude [mailto:tirpudeajay1@gmail.com] >>> *Sent:* Sunday, November 13, 2016 4:46 AM >>> *To:* user@hive.apache.org >>> *Subject:* Nested JSON Parsing >>> >>> >>> >>> Dear All, >>> >>> >>> >>> I am trying to parse this json file given below and my intention is to >>> convert this json file into a csv. >>> >>> >>> >>> *{* >>> >>> * "devicetype": "SmartPhone",* >>> >>> * "uuid": "sg76fdhh7gfxhxfhgxf67x",* >>> >>> * "ts": {* >>> >>> * "date": "2016-03-23T10:58:34.660Z"* >>> >>> * },* >>> >>> * "events": [* >>> >>> * {* >>> >>> * "timestamp": "2016-03-23T10:58:37Z",* >>> >>> * "evt": "first",* >>> >>> * "ad": "v6v75v88n98778mn",* >>> >>> * "tkey": "ngbbc76fbc6fb6fb66fb6",* >>> >>> * "mtp": "Wed Mar 23 2016 19:04:22 GMT 0800 (PHT)",* >>> >>> * "eventid": "eytuy"* >>> >>> * },* >>> >>> * {* >>> >>> * "timestamp": "2016-03-23T10:58:35Z",* >>> >>> * "evt": "second",* >>> >>> * "ad": "v6v75v88n98778mn",* >>> >>> * "tkey": "ngbbc76fbc6fb6fb66fb6"* >>> >>> * },* >>> >>> * {* >>> >>> * "timestamp": "2016-03-23T10:58:36Z",* >>> >>> * "evt": "third",* >>> >>> * "ad": "v6v75v88n98778mn",* >>> >>> * "tkey": "ngbbc76fbc6fb6fb66fb6"* >>> >>> * }* >>> >>> * ],* >>> >>> * "adid": "v6v75v88n98778mn",* >>> >>> * "ad_tz": {* >>> >>> * "date": "2016-03-23T10:58:34.660Z"* >>> >>> * },* >>> >>> * "ua": "Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 >>> Build/JSS15J) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile >>> Safari/534.30"* >>> >>> *}* >>> >>> >>> >>> There are few conditions that I need to apply before I parse >>> >>> >>> >>> 1. I want to get all the fields except timestamp inside events nested >>> key. >>> >>> 2. I want to loop events key for each evt. In above input file there are >>> three evts but that would not fixed in the actual input file. There can be >>> multiple evts and not just 3. >>> >>> 3. Not every evt block is similar. You can have different extra field in >>> each evt block but we need to extract every key. In case we don't have key >>> in one evt then the value should be blank for that env. For example for >>> evt: first we have two extra key value pair i.,e, eventid/mtp and these >>> value should be blank for other evts. Similarly we can have some key:value >>> in other evts as well so that other key:values should be blank in other >>> evts. >>> >>> >>> >>> At last I want the output to be like this >>> >>> >>> >>> devicetype >>> >>> uuid >>> >>> ts.date >>> >>> events.evt >>> >>> events.ad >>> >>> events.tkey >>> >>> events.mtp >>> >>> events.eventid >>> >>> adid >>> >>> ad_tz.date >>> >>> ua >>> >>> SmartPhone >>> >>> sg76fdhh7gfxhxfhgxf67x >>> >>> 2016-03-23T10:58:34.660Z >>> >>> first >>> >>> v6v75v88n98778mn >>> >>> ngbbc76fbc6fb6fb66fb6 >>> >>> Wed Mar 23 2016 19:04:22 GMT 0800 (PHT) >>> >>> eytuy >>> >>> v6v75v88n98778mn >>> >>> 2016-03-23T10:58:34.660Z >>> >>> Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 Build/JSS15J) >>> AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 >>> >>> SmartPhone >>> >>> sg76fdhh7gfxhxfhgxf67x >>> >>> 2016-03-23T10:58:34.660Z >>> >>> second >>> >>> v6v75v88n98778mn >>> >>> ngbbc76fbc6fb6fb66fb6 >>> >>> v6v75v88n98778mn >>> >>> 2016-03-23T10:58:34.660Z >>> >>> Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 Build/JSS15J) >>> AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 >>> >>> SmartPhone >>> >>> sg76fdhh7gfxhxfhgxf67x >>> >>> 2016-03-23T10:58:34.660Z >>> >>> third >>> >>> v6v75v88n98778mn >>> >>> ngbbc76fbc6fb6fb66fb6 >>> >>> >>> >>> >>> >>> v6v75v88n98778mn >>> >>> 2016-03-23T10:58:34.660Z >>> >>> Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 Build/JSS15J) >>> AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 >>> >>> >>> >>> Regards, >>> >>> Ajay T >>> >> >> > > > -- > > > Regards, > Satya Harish Appana, > Software Development Engineer II, > Flipkart,Bangalore, > Ph:+91-9538797174. > --001a1144a93e01420c0541295a29 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Satya,

Thanks I have already started= checking JSON Serde. Lets see if it works. By the way can we write UDFs in= Python/Ruby?

Regards,
Ajay T

On Sun, Nov 13, = 2016 at 12:30 PM, Satya Harish Appana <satyaharish.appana@gmail= .com> wrote:
You can use these

=

= or else you can write a hive udtf, (Eg:=C2= =A0http://beekee= perdata.com/posts/hadoop/2015/07/26/Hive-UDTF-Tutorial.html)


=

On Sun, Nov 13, 2016 = at 12:22 PM, Ajay Tirpude <tirpudeajay1@gmail.com> wrot= e:
Hi Dudu,

I want to parse my json file and get the desired output in csv file = that I pasted in the output section. Currently I am able to achieve this us= ing bash(jq command) but that is not an answer for json files that are in T= Bs. So I am looking for a solution in PIG or HIVE.

Regards,
Ajay T
On Sun, Nov 13, 2016 at 12:10 PM, Markovitz, Du= du <dmarkovitz@paypal.com> wrote:

And your issue/question is?=

=C2=A0

From: Ajay Tirpude [mailto:tirpudeajay1@gmail.com= ]
Sent: Sunday, November 13, 2016 4:46 AM
To: user@h= ive.apache.org
Subject: Nested JSON Parsing

=C2=A0

Dear All,

=C2=A0

I am trying to parse this json file given below and = my intention is to convert this json file into a csv.

=C2=A0

{

=C2=A0 "devicetype": "SmartPhone&q= uot;,

=C2=A0 "uuid": "sg76fdhh7gfxhxfhgx= f67x",

=C2=A0 "ts": {

=C2=A0 =C2=A0 "date": "2016-03-23T= 10:58:34.660Z"

=C2=A0 },

=C2=A0 "events": [

=C2=A0 =C2=A0 {

=C2=A0 =C2=A0 =C2=A0 "timestamp": "= ;2016-03-23T10:58:37Z",

=C2=A0 =C2=A0 =C2=A0 "evt": "first= ",

=C2=A0 =C2=A0 =C2=A0 "ad": "v6v75v= 88n98778mn",

=C2=A0 =C2=A0 =C2=A0 "tkey": "ngbb= c76fbc6fb6fb66fb6",

=C2=A0 =C2=A0 =C2=A0 "mtp": "Wed M= ar 23 2016 19:04:22 GMT 0800 (PHT)",

=C2=A0 =C2=A0 =C2=A0 "eventid": "e= ytuy"

=C2=A0 =C2=A0 },

=C2=A0 =C2=A0 {

=C2=A0 =C2=A0 =C2=A0 "timestamp": "= ;2016-03-23T10:58:35Z",

=C2=A0 =C2=A0 =C2=A0 "evt": "secon= d",

=C2=A0 =C2=A0 =C2=A0 "ad": "v6v75v= 88n98778mn",

=C2=A0 =C2=A0 =C2=A0 "tkey": "ngbb= c76fbc6fb6fb66fb6"

=C2=A0 =C2=A0 },

=C2=A0 =C2=A0 {

=C2=A0 =C2=A0 =C2=A0 "timestamp": "= ;2016-03-23T10:58:36Z",

=C2=A0 =C2=A0 =C2=A0 "evt": "third= ",

=C2=A0 =C2=A0 =C2=A0 "ad": "v6v75v= 88n98778mn",

=C2=A0 =C2=A0 =C2=A0 "tkey": "ngbb= c76fbc6fb6fb66fb6"

=C2=A0 =C2=A0 }

=C2=A0 ],

=C2=A0 "adid": "v6v75v88n98778mn&q= uot;,

=C2=A0 "ad_tz": {

=C2=A0 =C2=A0 "date": "2016-03-23T= 10:58:34.660Z"

=C2=A0 },

=C2=A0 "ua": "Mozilla/5.0 (Linux; = U; Android 4.3; en-gb; SM-N9005 Build/JSS15J) AppleWebKit/534.30 (KHTML, li= ke Gecko) Version/4.0 Mobile Safari/534.30"

}

=C2=A0

There are few conditions that I need to apply before= I parse

=C2=A0

1. I want to get all the fields except timestamp ins= ide events nested key.

2. I want to loop events key for each evt. In above = input file there are three evts but that would not fixed in the actual inpu= t file. There can be multiple evts and not just 3.

3. Not every evt block is similar. You can have diff= erent extra field in each evt block but we need to extract every key. In ca= se we don't have key in one evt then the value should be blank for that= env. For example for evt: first we have two extra key value pair i.,e, eventid/mtp and these value should be blank= for other evts. Similarly we can have some key:value in other evts as well= so that other key:values should be blank in other evts.

=C2=A0

At last I want the output to be like this<= /u>

=C2=A0

devicetype

uuid

ts.date

events.evt

event= s.ad

events.tkey

events.mtp

events.eventid

adid

ad_tz.date

ua

SmartPhone

sg76fdhh7gfxhxfhgxf67x

2016-03-23T10:58:34.660Z

first

v6v75v88n98778mn

ngbbc76fbc6fb6fb66fb6

Wed Mar 23 2016 19:04:22 GMT 0800 (PHT)

eytuy

v6v75v88n98778mn

2016-03-23T10:58:34.660Z

Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 = Build/JSS15J) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Saf= ari/534.30

SmartPhone

sg76fdhh7gfxhxfhgxf67x

2016-03-23T10:58:34.660Z

second

v6v75v88n98778mn

ngbbc76fbc6fb6fb66fb6

v6v75v88n98778mn

2016-03-23T10:58:34.660Z

Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 = Build/JSS15J) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Saf= ari/534.30

SmartPhone

sg76fdhh7gfxhxfhgxf67x

2016-03-23T10:58:34.660Z

third

v6v75v88n98778mn

ngbbc76fbc6fb6fb66fb6

=C2=A0

=C2=A0

v6v75v88n98778mn

2016-03-23T10:58:34.660Z

Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 = Build/JSS15J) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Saf= ari/534.30

=C2=A0

Regards,

Ajay T





<= /div>--
=09 =09 =09


Regards,
Satya Harish=C2=A0Appana,
Software Develop= ment Engineer II,
Flipkart,Bangalore,
Ph:+91-9538797174.<= br>

--001a1144a93e01420c0541295a29--