Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A626F17A8A for ; Tue, 27 Oct 2015 20:25:36 +0000 (UTC) Received: (qmail 59358 invoked by uid 500); 27 Oct 2015 20:25:35 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 59102 invoked by uid 500); 27 Oct 2015 20:25:35 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 59092 invoked by uid 99); 27 Oct 2015 20:25:35 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Oct 2015 20:25:35 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id AB29D1A2D2A for ; Tue, 27 Oct 2015 20:25:34 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.901 X-Spam-Level: ** X-Spam-Status: No, score=2.901 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id LoN7AbGIBFTJ for ; Tue, 27 Oct 2015 20:25:19 +0000 (UTC) Received: from mail-ig0-f179.google.com (mail-ig0-f179.google.com [209.85.213.179]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 86C4243CD5 for ; Tue, 27 Oct 2015 20:25:19 +0000 (UTC) Received: by igbkq10 with SMTP id kq10so91865077igb.0 for ; Tue, 27 Oct 2015 13:25:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Fa9RBe2Y6wJ0I9JPClWoXRwL6QyHoyparLeB0WwjsFc=; b=aGDAJeJpYT1mRfQEnii+LtfZb5cGiEP7AEY4UgSYzBOWmccNAGrjnY0L696lLYxYM6 Qmh17ahs/zoezATpgCPhkXK3tqMIpAkiQgiUAwTmrzy/8BUc1zLGOybmblqMj8T6LkRq FodYL0Aq4HAYeEW9p4Q1iwTd9j8x+qKCEogqAknvsOVeqFXBM3AHDwqOQ70HqzD5Vm8P hmCvkzS9+/jONIe/5OdXYgWEs57noC2Hgt5V66Ejq9Om0mt3PGktQD5xHozf/Au52mSg dIk4xBhmJ60BPwC6OUfyxvq/pSgjqyBDtdekY9/Y/hzqOU2nqEzkHEARCxTnTrkhvJ3V o80Q== MIME-Version: 1.0 X-Received: by 10.50.142.8 with SMTP id rs8mr28719811igb.77.1445977519107; Tue, 27 Oct 2015 13:25:19 -0700 (PDT) Received: by 10.107.141.69 with HTTP; Tue, 27 Oct 2015 13:25:19 -0700 (PDT) In-Reply-To: References: Date: Tue, 27 Oct 2015 16:25:19 -0400 Message-ID: Subject: Re: Using json_tuple for Nested json Arrays From: Sam Joe To: user@hive.apache.org Content-Type: multipart/alternative; boundary=001a11c2f70e61342905231be058 --001a11c2f70e61342905231be058 Content-Type: text/plain; charset=UTF-8 Hi Ryan, The simple query is running fine as shown below: hive> SELECT tr2.id, tr2.possibly_sensitive > FROM tweets_raw tr1 > LATERAL VIEW json_tuple(tr1.text_col, 'id', 'extended_entities', 'possibly_sensitive') tr2 as id, extended_entities, possibly_sensitive > where tr2.id='654395184428515332' > LIMIT 1; OK 654395184428515332 false Time taken: 1.813 seconds, Fetched: 1 row(s) hive> However, if I try to get any data from the json array it's failing. Thanks, Joel On Tue, Oct 27, 2015 at 4:21 PM, Ryan Harris wrote: > looking at your sample data, you shouldn't need to use lateral view > explode unless you are trying to get 1 entry per row for your media sizes > (thumb, small, large, medium, etc) ... > > > > Try starting with something simple like : > > > > SELECT get_json_object(text_col, '$.id') as id FROM tweets_raw limit 10; > > > > You should also be able to use json_tuple(), but start simple.... > > > > *From:* Sam Joe [mailto:games2013.sam@gmail.com] > *Sent:* Tuesday, October 27, 2015 1:43 PM > > *To:* user@hive.apache.org > *Subject:* Re: Using json_tuple for Nested json Arrays > > > > Hi Ryan, > > > > Thanks for your reply! I didn't try using get_json_object() UDF. I will > try using that and let you know the results. > > > > I tried using the following script which failed : > > > > SELECT tr2.id, tr2.possibly_sensitive, tr3.media, > media_object.source_user_id > > FROM tweets_raw tr1 > > LATERAL VIEW json_tuple(tr1.text_col, 'id', 'extended_entities', > 'possibly_sensitive') tr2 as id, extended_entities, possibly_sensitive > > LATERAL VIEW json_tuple(tr2.extended_entities, 'media') tr3 as media > > LATERAL VIEW EXPLODE(tr3.media) media_exploded as media_object > > where tr2.id='654395184428515332' > > LIMIT 1; > > > > FAILED: UDFArgumentException explode() takes an array or a map as a > parameter > > > > > > Thanks, > > Joel > > > > On Tue, Oct 27, 2015 at 3:37 PM, Ryan Harris > wrote: > > Do you have an example of the query that you tried (which failed). > > In short, you probably want to use the get_json_object() UDF: > > > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-get_json_object > > > > if you need the JSON array broken into individual records, you might > require lateral view explode as in these examples: > > > http://mechanics.flite.com/blog/2014/04/16/using-explode-and-lateral-view-in-hive/ > > > http://stackoverflow.com/questions/28716165/how-to-query-struct-array-with-hive-get-json-object > > > > > > *From:* Sam Joe [mailto:games2013.sam@gmail.com] > *Sent:* Tuesday, October 27, 2015 1:29 PM > *To:* user@hive.apache.org > *Subject:* Re: Using json_tuple for Nested json Arrays > > > > I tried using EXPLODE function on the nested json array but it doesn't > work and throws following error: > > > > FAILED: UDFArgumentException explode() takes an array or a map as a > parameter > > > > > > Thanks, > > Joel > > > > On Tue, Oct 27, 2015 at 3:20 PM, Sam Joe wrote: > > Hi, > > > > Is it possible to use json_tuple function to extract data from json arrays > (nested too). I am trying to process json data as string and avoid using > serdes since user data may be malformed. > > > > Please see a sample json data given below: > > > > > > { > > "filter_level": "low", > > "retweeted": false, > > "in_reply_to_screen_name": null, > > "possibly_sensitive": false, > > "truncated": false, > > "lang": "en", > > "in_reply_to_status_id_str": null, > > "id": 654395184428515332, > > "extended_entities": { > > "media": [{ > > "sizes": { > > "thumb": { > > "w": 150, > > "resize": "crop", > > "h": 150 > > }, > > "small": { > > "w": 340, > > "resize": "fit", > > "h": 255 > > }, > > "large": { > > "w": 1024, > > "resize": "fit", > > "h": 768 > > }, > > "medium": { > > "w": 600, > > "resize": "fit", > > "h": 450 > > } > > }, > > "source_user_id": 16864598, > > "media_url": "http://pbs.twimg.com/media/CRSL2MPWsAAOnZo.jpg", > > "display_url": "pic.twitter.com/i3004WyF4g", > > "type": "photo", > > "url": "http://t.co/i3004WyF4g", > > "id": 654301608990388224, > > "media_url_https": "https://pbs.twimg.com/media/CRSL2MPWsAAOnZo.jpg", > > "expanded_url": " > http://twitter.com/lordlancaster/status/654301626665189376/photo/1", > > "source_user_id_str": "16864598", > > "indices": [143, > > 144], > > "source_status_id_str": "654301626665189376", > > "source_status_id": 654301626665189376, > > "id_str": "654301608990388224" > > }, > > { > > "sizes": { > > "thumb": { > > "w": 150, > > "resize": "crop", > > "h": 150 > > }, > > "small": { > > "w": 340, > > "resize": "fit", > > "h": 255 > > }, > > "large": { > > "w": 1024, > > "resize": "fit", > > "h": 768 > > }, > > "medium": { > > "w": 600, > > "resize": "fit", > > "h": 450 > > } > > }, > > "source_user_id": 16864598, > > "media_url": "http://pbs.twimg.com/media/CRSL2MRWgAAGOcj.jpg", > > "display_url": "pic.twitter.com/i3004WyF4g", > > "type": "photo", > > "url": "http://t.co/i3004WyF4g", > > "id": 654301608998764544, > > "media_url_https": "https://pbs.twimg.com/media/CRSL2MRWgAAGOcj.jpg", > > "expanded_url": " > http://twitter.com/lordlancaster/status/654301626665189376/photo/1", > > "source_user_id_str": "16864598", > > "indices": [143, > > 144], > > "source_status_id_str": "654301626665189376", > > "source_status_id": 654301626665189376, > > "id_str": "654301608998764544" > > }, > > { > > "sizes": { > > "thumb": { > > "w": 150, > > "resize": "crop", > > "h": 150 > > }, > > "small": { > > "w": 340, > > "resize": "fit", > > "h": 255 > > }, > > "large": { > > "w": 1024, > > "resize": "fit", > > "h": 768 > > }, > > "medium": { > > "w": 600, > > "resize": "fit", > > "h": 450 > > } > > }, > > "source_user_id": 16864598, > > "media_url": "http://pbs.twimg.com/media/CRSL2MQWwAAP4Qo.jpg", > > "display_url": "pic.twitter.com/i3004WyF4g", > > "type": "photo", > > "url": "http://t.co/i3004WyF4g", > > "id": 654301608994586624, > > "media_url_https": "https://pbs.twimg.com/media/CRSL2MQWwAAP4Qo.jpg", > > "expanded_url": " > http://twitter.com/lordlancaster/status/654301626665189376/photo/1", > > "source_user_id_str": "16864598", > > "indices": [143, > > 144], > > "source_status_id_str": "654301626665189376", > > "source_status_id": 654301626665189376, > > "id_str": "654301608994586624" > > }, > > { > > "sizes": { > > "thumb": { > > "w": 150, > > "resize": "crop", > > "h": 150 > > }, > > "small": { > > "w": 340, > > "resize": "fit", > > "h": 255 > > }, > > "large": { > > "w": 1024, > > "resize": "fit", > > "h": 768 > > }, > > "medium": { > > "w": 600, > > "resize": "fit", > > "h": 450 > > } > > }, > > "source_user_id": 16864598, > > "media_url": "http://pbs.twimg.com/media/CRSL2M8WcAEXowZ.jpg", > > "display_url": "pic.twitter.com/i3004WyF4g", > > "type": "photo", > > "url": "http://t.co/i3004WyF4g", > > "id": 654301609179115521, > > "media_url_https": "https://pbs.twimg.com/media/CRSL2M8WcAEXowZ.jpg", > > "expanded_url": " > http://twitter.com/lordlancaster/status/654301626665189376/photo/1", > > "source_user_id_str": "16864598", > > "indices": [143, > > 144], > > "source_status_id_str": "654301626665189376", > > "source_status_id": 654301626665189376, > > "id_str": "654301609179115521" > > }] > > } > > } > > > > Appreciate any help! > > > > *Thanks,* > > *Joel* > > > > > > > ------------------------------ > > THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS > CONFIDENTIAL and may contain information that is privileged and exempt from > disclosure under applicable law. If you are neither the intended recipient > nor responsible for delivering the message to the intended recipient, > please note that any dissemination, distribution, copying or the taking of > any action in reliance upon the message is strictly prohibited. If you have > received this communication in error, please notify the sender immediately. > Thank you. > > > ------------------------------ > THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS > CONFIDENTIAL and may contain information that is privileged and exempt from > disclosure under applicable law. If you are neither the intended recipient > nor responsible for delivering the message to the intended recipient, > please note that any dissemination, distribution, copying or the taking of > any action in reliance upon the message is strictly prohibited. If you have > received this communication in error, please notify the sender immediately. > Thank you. > --001a11c2f70e61342905231be058 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Ryan,

The simple query is running fi= ne as shown below:

hive> =C2=A0 SELECT tr2.id, tr2.possibly_sensitive
=C2=A0 = =C2=A0 > =C2=A0 FROM tweets_raw tr1
=C2=A0 =C2=A0 > =C2=A0 = LATERAL VIEW json_tuple(tr1.text_col, 'id', 'extended_entities&= #39;, 'possibly_sensitive') tr2 as id, extended_entities, possibly_= sensitive
=C2=A0 =C2=A0 > =C2=A0 where tr2.id=3D'654395184428515332'
=C2=A0 =C2=A0 > = =C2=A0 LIMIT 1;
OK
654395184428515332 =C2=A0 =C2=A0 =C2= =A0false
Time taken: 1.813 seconds, Fetched: 1 row(s)
h= ive>


However, if I try to = get any data from the json array it's failing.

Thanks,
Joel

On Tue, Oct 27, 2015 at 4:21 PM, Ryan Harris <Ryan.Harris@zionsbancorp.com> wrote:

looking at your sample da= ta, you shouldn't need to use lateral view explode unless you are tryin= g to get 1 entry per row for =C2=A0your media sizes (thumb, small, large, medium, etc) ...

=C2=A0

Try starting with somethi= ng simple like :

=C2=A0

SELECT get_json_object(text_col, '$.id&#= 39;) as id FROM tweets_raw limit 10;

=C2=A0

You should also be able t= o use json_tuple(), but start simple....

=C2=A0

From: Sam Joe = [mailto:games2= 013.sam@gmail.com]
Sent: Tuesday, October 27, 2015 1:43 PM


To: user@h= ive.apache.org
Subject: Re: Using json_tuple for Nested json Arrays

=C2=A0

Hi Ryan,

=C2=A0

Thanks for your reply! I didn't try using=C2=A0= =C2=A0get_json_object() UDF. I will try using that and let you know the res= ults.

=C2=A0

I tried using the following script which failed :=

=C2=A0

=C2=A0 SELECT tr2.id, tr2.possibly_sensitive, tr3.media, media_object.source_user_id<= /span>

=C2=A0 FROM tweet= s_raw tr1

=C2=A0 LATERAL VI= EW json_tuple(tr1.text_col, 'id', 'extended_entities', '= ;possibly_sensitive') tr2 as id, extended_entities, possibly_sensitive<= /span>

=C2=A0 LATERAL VI= EW json_tuple(tr2.extended_entities, 'media') tr3 as media

=C2=A0 LATERAL VI= EW EXPLODE(tr3.media) media_exploded as media_object

=C2=A0 where tr2.id=3D'654395184428515332'

=C2=A0 LIMIT 1;

=C2=A0

FAILED: UDFArgume= ntException explode() takes an array or a map as a parameter<= u>

=C2=A0

=C2=A0

Thanks,

Joel

=C2=A0

On Tue, Oct 27, 2015 at 3:37 PM, Ryan Harris <Ryan.Harris@= zionsbancorp.com> wrote:

Do you have an example of= the query that you tried (which failed).

In short, you probably wa= nt to use the get_json_object() UDF:

https://cwiki.apache.org/confluence/display= /Hive/LanguageManual+UDF#LanguageManualUDF-get_json_object

=C2=A0

if you need the JSON arra= y broken into individual records, you might require lateral view explode as in these examples:

http://mechanics.flite.com/blog/2014/04/16/using-explode-and-l= ateral-view-in-hive/

http://stackoverflow.com/questions/28716165/how-t= o-query-struct-array-with-hive-get-json-object

=C2=A0

=C2=A0

From: Sam Joe = [mailto:games2= 013.sam@gmail.com]
Sent: Tuesday, October 27, 2015 1:29 PM
To: user@h= ive.apache.org
Subject: Re: Using json_tuple for Nested json Arrays

=C2=A0

I tried using EXPLODE function on the nested json ar= ray but it doesn't work and throws following error:

=C2=A0

FAILED: UDFArgumentException explode() takes an arra= y or a map as a parameter

=C2=A0

=C2=A0

Thanks,

Joel

=C2=A0

On Tue, Oct 27, 2015 at 3:20 PM, Sam Joe <games2013.sam@gmail.c= om> wrote:

Hi,

=C2=A0

Is it possible to use json_tuple function to extract= data from json arrays (nested too). I am trying to process json data as st= ring and avoid using serdes since user data may be malformed.

=C2=A0

Please see a sample json data given below:=

=C2=A0

=C2=A0

{

"filter_level": "low",=

"retweeted": false,

"in_reply_to_screen_name": null,=

"possibly_sensitive": false,=

"truncated": false,

"lang": "en",

"in_reply_to_status_id_str": null,<= u>

"id": 654395184428515332,

"extended_entities": {

"media": [{

"sizes": {

"thumb": {

"w": 150,

"resize": "crop",<= /p>

"h": 150

},

"small": {

"w": 340,

"resize": "fit",

"h": 255

},

"large": {

"w": 1024,

"resize": "fit",

"h": 768

},

"medium": {

"w": 600,

"resize": "fit",

"h": 450

}

},

"source_user_id": 16864598,<= /p>

"media_url": "http://pbs.twimg.com/= media/CRSL2MPWsAAOnZo.jpg",

"display_url": "pic.twitter.com/i3004WyF4g&q= uot;,

"type": "photo",

"url": "http://t.co/i3004WyF4g",

"id": 654301608990388224,

"media_url_https": "https://pbs.tw= img.com/media/CRSL2MPWsAAOnZo.jpg",

"expanded_url": "http://twitter.com/lordlancaster/status/654301626665189376/photo/1&qu= ot;,

"source_user_id_str": "16864598"= ,

"indices": [143,

144],

"source_status_id_str": "654301626665= 189376",

"source_status_id": 654301626665189376,=

"id_str": "654301608990388224"

},

{

"sizes": {

"thumb": {

"w": 150,

"resize": "crop",<= /p>

"h": 150

},

"small": {

"w": 340,

"resize": "fit",

"h": 255

},

"large": {

"w": 1024,

"resize": "fit",

"h": 768

},

"medium": {

"w": 600,

"resize": "fit",

"h": 450

}

},

"source_user_id": 16864598,<= /p>

"media_url": "http://pbs.twimg.com/= media/CRSL2MRWgAAGOcj.jpg",

"display_url": "pic.twitter.com/i3004WyF4g&q= uot;,

"type": "photo",

"url": "http://t.co/i3004WyF4g",

"id": 654301608998764544,

"media_url_https": "https://pbs.tw= img.com/media/CRSL2MRWgAAGOcj.jpg",

"expanded_url": "http://twitter.com/lordlancaster/status/654301626665189376/photo/1&qu= ot;,

"source_user_id_str": "16864598"= ,

"indices": [143,

144],

"source_status_id_str": "654301626665= 189376",

"source_status_id": 654301626665189376,=

"id_str": "654301608998764544"

},

{

"sizes": {

"thumb": {

"w": 150,

"resize": "crop",<= /p>

"h": 150

},

"small": {

"w": 340,

"resize": "fit",

"h": 255

},

"large": {

"w": 1024,

"resize": "fit",

"h": 768

},

"medium": {

"w": 600,

"resize": "fit",

"h": 450

}

},

"source_user_id": 16864598,<= /p>

"media_url": "http://pbs.twimg.com/= media/CRSL2MQWwAAP4Qo.jpg",

"display_url": "pic.twitter.com/i3004WyF4g&q= uot;,

"type": "photo",

"url": "http://t.co/i3004WyF4g",

"id": 654301608994586624,

"media_url_https": "https://pbs.tw= img.com/media/CRSL2MQWwAAP4Qo.jpg",

"expanded_url": "http://twitter.com/lordlancaster/status/654301626665189376/photo/1&qu= ot;,

"source_user_id_str": "16864598"= ,

"indices": [143,

144],

"source_status_id_str": "654301626665= 189376",

"source_status_id": 654301626665189376,=

"id_str": "654301608994586624"

},

{

"sizes": {

"thumb": {

"w": 150,

"resize": "crop",<= /p>

"h": 150

},

"small": {

"w": 340,

"resize": "fit",

"h": 255

},

"large": {

"w": 1024,

"resize": "fit",

"h": 768

},

"medium": {

"w": 600,

"resize": "fit",

"h": 450

}

},

"source_user_id": 16864598,<= /p>

"media_url": "http://pbs.twimg.com/= media/CRSL2M8WcAEXowZ.jpg",

"display_url": "pic.twitter.com/i3004WyF4g&q= uot;,

"type": "photo",

"url": "http://t.co/i3004WyF4g",

"id": 654301609179115521,

"media_url_https": "https://pbs.tw= img.com/media/CRSL2M8WcAEXowZ.jpg",

"expanded_url": "http://twitter.com/lordlancaster/status/654301626665189376/photo/1&qu= ot;,

"source_user_id_str": "16864598"= ,

"indices": [143,

144],

"source_status_id_str": "654301626665= 189376",

"source_status_id": 654301626665189376,=

"id_str": "654301609179115521"

}]

}

}

=C2=A0

Appreciate any help!

=C2=A0

Thanks,

Joel

=C2=A0

=C2=A0

=C2=A0


THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING = DOCUMENTS, IS CONFIDENTIAL and may contain information that is privileged a= nd exempt from disclosure under applicable law. If you are neither the inte= nded recipient nor responsible for delivering the message to the intended recipient, please note that any dis= semination, distribution, copying or the taking of any action in reliance u= pon the message is strictly prohibited. If you have received this communica= tion in error, please notify the sender immediately. Thank you.

=C2=A0


THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFI= DENTIAL and may contain information that is privileged and exempt from disc= losure under applicable law. If you are neither the intended recipient nor = responsible for delivering the message to the intended recipient, please no= te that any dissemination, distribution, copying or the taking of any actio= n in reliance upon the message is strictly prohibited. If you have received= this communication in error, please notify the sender immediately. Thank = you.

--001a11c2f70e61342905231be058--