hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Joe <games2013....@gmail.com>
Subject Re: Using json_tuple for Nested json Arrays
Date Wed, 28 Oct 2015 01:36:52 GMT
Hi Ryan,

I think tr3.media  a complex json array having nested json tuple objects.
For example, sizes is a json tuple object present inside the array which I
think the function EXPLODE is not expecting. May be the explode function is
expecting a closing brace '}' corresponding to the first brace '{' instead
of another opening brace.

So, could this be a bug?

Thanks,
Joel

On Tue, Oct 27, 2015 at 5:22 PM, Ryan Harris <Ryan.Harris@zionsbancorp.com>
wrote:

> hmmm...I'm not sure what the return value type of json_tuple is...
>
> I'd probably try creating a temporary table from your working query below
> and then work on getting the lateral view explode to work against the temp
> table.
>
>
>
> FAILED: UDFArgumentException explode() takes an array or a map as a
> parameter
>
>  Apparently, hive doesn't think tr3.media is an array or map..so what is
> it?
>
>
>
> *From:* Sam Joe [mailto:games2013.sam@gmail.com]
> *Sent:* Tuesday, October 27, 2015 2:56 PM
>
> *To:* user@hive.apache.org
> *Subject:* Re: Using json_tuple for Nested json Arrays
>
>
>
> Hi Ryan,
>
>
>
> The statement returns null for media as shown below:
>
>
>
> hive> SELECT tr2.id, tr2.possibly_sensitive, tr2.media
>
>     >   FROM tweets_raw tr1
>
>     >   LATERAL VIEW json_tuple(tr1.text_col, 'id', 'extended_entities',
> 'possibly_sensitive', 'extended_entities.media') tr2 as id,
> extended_entities, possibly_sensitive, media
>
>     >   where tr2.id='654395184428515332'
>
>     >   LIMIT 1;
>
> OK
>
> 654395184428515332      false   NULL
>
>
>
>
>
> I am able to return the media json array using the following script:
>
>
>
> hive>   SELECT tr2.id, tr2.possibly_sensitive, tr3.media
>
>     >   FROM tweets_raw tr1
>
>     >   LATERAL VIEW json_tuple(tr1.text_col, 'id', 'extended_entities',
> 'possibly_sensitive') tr2 as id, extended_entities, possibly_sensitive
>
>     >   LATERAL VIEW json_tuple(tr2.extended_entities, 'media') tr3 as
> media
>
>     >   where tr2.id='654395184428515332'
>
>     >   LIMIT 1;
>
> OK
>
> 654395184428515332      false
> [{"sizes":{"thumb":{"w":150,"resize":"crop","h":150},"small":{"w":340,"resize":"fit","h":255},"large":{"w":1024,"resize":"fit","h":768},"medium":{"w":600,"resize":"fit","h":450}},"source_user_id":16864598,"media_url":"
> http://pbs.twimg.com/media/CRSL2MPWsAAOnZo.jpg","display_url":"
> pic.twitter.com/i3004WyF4g","type":"photo","url":"http://t.co/i3004WyF4g
> ","id":654301608990388224,"media_url_https":"
> https://pbs.twimg.com/media/CRSL2MPWsAAOnZo.jpg","expanded_url":"
> http://twitter.com/lordlancaster/status/654301626665189376/photo/1
> ","source_user_id_str":"16864598","indices":[143,144],"source_status_id_str":"654301626665189376","source_status_id":654301626665189376,"id_str":"654301608990388224"},{"sizes":{"thumb":{"w":150,"resize":"crop","h":150},"small":{"w":340,"resize":"fit","h":255},"large":{"w":1024,"resize":"fit","h":768},"medium":{"w":600,"resize":"fit","h":450}},"source_user_id":16864598,"media_url":"
> http://pbs.twimg.com/media/CRSL2MRWgAAGOcj.jpg","display_url":"
> pic.twitter.com/i3004WyF4g","type":"photo","url":"http://t.co/i3004WyF4g
> ","id":654301608998764544,"media_url_https":"
> https://pbs.twimg.com/media/CRSL2MRWgAAGOcj.jpg","expanded_url":"
> http://twitter.com/lordlancaster/status/654301626665189376/photo/1
> ","source_user_id_str":"16864598","indices":[143,144],"source_status_id_str":"654301626665189376","source_status_id":654301626665189376,"id_str":"654301608998764544"},{"sizes":{"thumb":{"w":150,"resize":"crop","h":150},"small":{"w":340,"resize":"fit","h":255},"large":{"w":1024,"resize":"fit","h":768},"medium":{"w":600,"resize":"fit","h":450}},"source_user_id":16864598,"media_url":"
> http://pbs.twimg.com/media/CRSL2MQWwAAP4Qo.jpg","display_url":"
> pic.twitter.com/i3004WyF4g","type":"photo","url":"http://t.co/i3004WyF4g
> ","id":654301608994586624,"media_url_https":"
> https://pbs.twimg.com/media/CRSL2MQWwAAP4Qo.jpg","expanded_url":"
> http://twitter.com/lordlancaster/status/654301626665189376/photo/1
> ","source_user_id_str":"16864598","indices":[143,144],"source_status_id_str":"654301626665189376","source_status_id":654301626665189376,"id_str":"654301608994586624"},{"sizes":{"thumb":{"w":150,"resize":"crop","h":150},"small":{"w":340,"resize":"fit","h":255},"large":{"w":1024,"resize":"fit","h":768},"medium":{"w":600,"resize":"fit","h":450}},"source_user_id":16864598,"media_url":"
> http://pbs.twimg.com/media/CRSL2M8WcAEXowZ.jpg","display_url":"
> pic.twitter.com/i3004WyF4g","type":"photo","url":"http://t.co/i3004WyF4g
> ","id":654301609179115521,"media_url_https":"
> https://pbs.twimg.com/media/CRSL2M8WcAEXowZ.jpg","expanded_url":"
> http://twitter.com/lordlancaster/status/654301626665189376/photo/1
> ","source_user_id_str":"16864598","indices":[143,144],"source_status_id_str":"654301626665189376","source_status_id":654301626665189376,"id_str":"654301609179115521"}]
>
> Time taken: 0.191 seconds, Fetched: 1 row(s)
>
> hive>
>
>
>
>
>
>
>
> However, I would like to know how I can access the json tuples present
> inside the media json array.
>
>
>
> Appreciate your help!
>
>
>
> Thanks,
>
> Joel
>
>
>
>
>
>
>
> On Tue, Oct 27, 2015 at 4:44 PM, Ryan Harris <Ryan.Harris@zionsbancorp.com>
> wrote:
>
> I see where you are going with this now....
>
>
>
> Not sure if you might be bumping into this bug:
>
> https://issues.apache.org/jira/browse/HIVE-1575
>
>
>
> since this line
>
>   LATERAL VIEW json_tuple(tr2.extended_entities, 'media') tr3 as media
>
> pulls the JSON array as a "top-level" object...
>
>
>
> does this not work?
>
>   LATERAL VIEW json_tuple(tr1.text_col, 'id', 'extended_entities',
> 'possibly_sensitive', 'extended_entities.media') tr2 as id,
> extended_entities, possibly_sensitive, media
>
>
>
>
>
> *From:* Sam Joe [mailto:games2013.sam@gmail.com]
> *Sent:* Tuesday, October 27, 2015 2:25 PM
>
>
> *To:* user@hive.apache.org
> *Subject:* Re: Using json_tuple for Nested json Arrays
>
>
>
> Hi Ryan,
>
>
>
> The simple query is running fine as shown below:
>
>
>
> hive>   SELECT tr2.id, tr2.possibly_sensitive
>
>     >   FROM tweets_raw tr1
>
>     >   LATERAL VIEW json_tuple(tr1.text_col, 'id', 'extended_entities',
> 'possibly_sensitive') tr2 as id, extended_entities, possibly_sensitive
>
>     >   where tr2.id='654395184428515332'
>
>     >   LIMIT 1;
>
> OK
>
> 654395184428515332      false
>
> Time taken: 1.813 seconds, Fetched: 1 row(s)
>
> hive>
>
>
>
>
>
> However, if I try to get any data from the json array it's failing.
>
>
>
> Thanks,
>
> Joel
>
>
>
> On Tue, Oct 27, 2015 at 4:21 PM, Ryan Harris <Ryan.Harris@zionsbancorp.com>
> wrote:
>
> looking at your sample data, you shouldn't need to use lateral view
> explode unless you are trying to get 1 entry per row for  your media sizes
> (thumb, small, large, medium, etc) ...
>
>
>
> Try starting with something simple like :
>
>
>
> SELECT get_json_object(text_col, '$.id') as id FROM tweets_raw limit 10;
>
>
>
> You should also be able to use json_tuple(), but start simple....
>
>
>
> *From:* Sam Joe [mailto:games2013.sam@gmail.com]
> *Sent:* Tuesday, October 27, 2015 1:43 PM
>
>
> *To:* user@hive.apache.org
> *Subject:* Re: Using json_tuple for Nested json Arrays
>
>
>
> Hi Ryan,
>
>
>
> Thanks for your reply! I didn't try using  get_json_object() UDF. I will
> try using that and let you know the results.
>
>
>
> I tried using the following script which failed :
>
>
>
>   SELECT tr2.id, tr2.possibly_sensitive, tr3.media,
> media_object.source_user_id
>
>   FROM tweets_raw tr1
>
>   LATERAL VIEW json_tuple(tr1.text_col, 'id', 'extended_entities',
> 'possibly_sensitive') tr2 as id, extended_entities, possibly_sensitive
>
>   LATERAL VIEW json_tuple(tr2.extended_entities, 'media') tr3 as media
>
>   LATERAL VIEW EXPLODE(tr3.media) media_exploded as media_object
>
>   where tr2.id='654395184428515332'
>
>   LIMIT 1;
>
>
>
> FAILED: UDFArgumentException explode() takes an array or a map as a
> parameter
>
>
>
>
>
> Thanks,
>
> Joel
>
>
>
> On Tue, Oct 27, 2015 at 3:37 PM, Ryan Harris <Ryan.Harris@zionsbancorp.com>
> wrote:
>
> Do you have an example of the query that you tried (which failed).
>
> In short, you probably want to use the get_json_object() UDF:
>
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-get_json_object
>
>
>
> if you need the JSON array broken into individual records, you might
> require lateral view explode as in these examples:
>
>
> http://mechanics.flite.com/blog/2014/04/16/using-explode-and-lateral-view-in-hive/
>
>
> http://stackoverflow.com/questions/28716165/how-to-query-struct-array-with-hive-get-json-object
>
>
>
>
>
> *From:* Sam Joe [mailto:games2013.sam@gmail.com]
> *Sent:* Tuesday, October 27, 2015 1:29 PM
> *To:* user@hive.apache.org
> *Subject:* Re: Using json_tuple for Nested json Arrays
>
>
>
> I tried using EXPLODE function on the nested json array but it doesn't
> work and throws following error:
>
>
>
> FAILED: UDFArgumentException explode() takes an array or a map as a
> parameter
>
>
>
>
>
> Thanks,
>
> Joel
>
>
>
> On Tue, Oct 27, 2015 at 3:20 PM, Sam Joe <games2013.sam@gmail.com> wrote:
>
> Hi,
>
>
>
> Is it possible to use json_tuple function to extract data from json arrays
> (nested too). I am trying to process json data as string and avoid using
> serdes since user data may be malformed.
>
>
>
> Please see a sample json data given below:
>
>
>
>
>
> {
>
> "filter_level": "low",
>
> "retweeted": false,
>
> "in_reply_to_screen_name": null,
>
> "possibly_sensitive": false,
>
> "truncated": false,
>
> "lang": "en",
>
> "in_reply_to_status_id_str": null,
>
> "id": 654395184428515332,
>
> "extended_entities": {
>
> "media": [{
>
> "sizes": {
>
> "thumb": {
>
> "w": 150,
>
> "resize": "crop",
>
> "h": 150
>
> },
>
> "small": {
>
> "w": 340,
>
> "resize": "fit",
>
> "h": 255
>
> },
>
> "large": {
>
> "w": 1024,
>
> "resize": "fit",
>
> "h": 768
>
> },
>
> "medium": {
>
> "w": 600,
>
> "resize": "fit",
>
> "h": 450
>
> }
>
> },
>
> "source_user_id": 16864598,
>
> "media_url": "http://pbs.twimg.com/media/CRSL2MPWsAAOnZo.jpg",
>
> "display_url": "pic.twitter.com/i3004WyF4g",
>
> "type": "photo",
>
> "url": "http://t.co/i3004WyF4g",
>
> "id": 654301608990388224,
>
> "media_url_https": "https://pbs.twimg.com/media/CRSL2MPWsAAOnZo.jpg",
>
> "expanded_url": "
> http://twitter.com/lordlancaster/status/654301626665189376/photo/1",
>
> "source_user_id_str": "16864598",
>
> "indices": [143,
>
> 144],
>
> "source_status_id_str": "654301626665189376",
>
> "source_status_id": 654301626665189376,
>
> "id_str": "654301608990388224"
>
> },
>
> {
>
> "sizes": {
>
> "thumb": {
>
> "w": 150,
>
> "resize": "crop",
>
> "h": 150
>
> },
>
> "small": {
>
> "w": 340,
>
> "resize": "fit",
>
> "h": 255
>
> },
>
> "large": {
>
> "w": 1024,
>
> "resize": "fit",
>
> "h": 768
>
> },
>
> "medium": {
>
> "w": 600,
>
> "resize": "fit",
>
> "h": 450
>
> }
>
> },
>
> "source_user_id": 16864598,
>
> "media_url": "http://pbs.twimg.com/media/CRSL2MRWgAAGOcj.jpg",
>
> "display_url": "pic.twitter.com/i3004WyF4g",
>
> "type": "photo",
>
> "url": "http://t.co/i3004WyF4g",
>
> "id": 654301608998764544,
>
> "media_url_https": "https://pbs.twimg.com/media/CRSL2MRWgAAGOcj.jpg",
>
> "expanded_url": "
> http://twitter.com/lordlancaster/status/654301626665189376/photo/1",
>
> "source_user_id_str": "16864598",
>
> "indices": [143,
>
> 144],
>
> "source_status_id_str": "654301626665189376",
>
> "source_status_id": 654301626665189376,
>
> "id_str": "654301608998764544"
>
> },
>
> {
>
> "sizes": {
>
> "thumb": {
>
> "w": 150,
>
> "resize": "crop",
>
> "h": 150
>
> },
>
> "small": {
>
> "w": 340,
>
> "resize": "fit",
>
> "h": 255
>
> },
>
> "large": {
>
> "w": 1024,
>
> "resize": "fit",
>
> "h": 768
>
> },
>
> "medium": {
>
> "w": 600,
>
> "resize": "fit",
>
> "h": 450
>
> }
>
> },
>
> "source_user_id": 16864598,
>
> "media_url": "http://pbs.twimg.com/media/CRSL2MQWwAAP4Qo.jpg",
>
> "display_url": "pic.twitter.com/i3004WyF4g",
>
> "type": "photo",
>
> "url": "http://t.co/i3004WyF4g",
>
> "id": 654301608994586624,
>
> "media_url_https": "https://pbs.twimg.com/media/CRSL2MQWwAAP4Qo.jpg",
>
> "expanded_url": "
> http://twitter.com/lordlancaster/status/654301626665189376/photo/1",
>
> "source_user_id_str": "16864598",
>
> "indices": [143,
>
> 144],
>
> "source_status_id_str": "654301626665189376",
>
> "source_status_id": 654301626665189376,
>
> "id_str": "654301608994586624"
>
> },
>
> {
>
> "sizes": {
>
> "thumb": {
>
> "w": 150,
>
> "resize": "crop",
>
> "h": 150
>
> },
>
> "small": {
>
> "w": 340,
>
> "resize": "fit",
>
> "h": 255
>
> },
>
> "large": {
>
> "w": 1024,
>
> "resize": "fit",
>
> "h": 768
>
> },
>
> "medium": {
>
> "w": 600,
>
> "resize": "fit",
>
> "h": 450
>
> }
>
> },
>
> "source_user_id": 16864598,
>
> "media_url": "http://pbs.twimg.com/media/CRSL2M8WcAEXowZ.jpg",
>
> "display_url": "pic.twitter.com/i3004WyF4g",
>
> "type": "photo",
>
> "url": "http://t.co/i3004WyF4g",
>
> "id": 654301609179115521,
>
> "media_url_https": "https://pbs.twimg.com/media/CRSL2M8WcAEXowZ.jpg",
>
> "expanded_url": "
> http://twitter.com/lordlancaster/status/654301626665189376/photo/1",
>
> "source_user_id_str": "16864598",
>
> "indices": [143,
>
> 144],
>
> "source_status_id_str": "654301626665189376",
>
> "source_status_id": 654301626665189376,
>
> "id_str": "654301609179115521"
>
> }]
>
> }
>
> }
>
>
>
> Appreciate any help!
>
>
>
> *Thanks,*
>
> *Joel*
>
>
>
>
>
>
> ------------------------------
>
> THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS
> CONFIDENTIAL and may contain information that is privileged and exempt from
> disclosure under applicable law. If you are neither the intended recipient
> nor responsible for delivering the message to the intended recipient,
> please note that any dissemination, distribution, copying or the taking of
> any action in reliance upon the message is strictly prohibited. If you have
> received this communication in error, please notify the sender immediately.
> Thank you.
>
>
> ------------------------------
>
> THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS
> CONFIDENTIAL and may contain information that is privileged and exempt from
> disclosure under applicable law. If you are neither the intended recipient
> nor responsible for delivering the message to the intended recipient,
> please note that any dissemination, distribution, copying or the taking of
> any action in reliance upon the message is strictly prohibited. If you have
> received this communication in error, please notify the sender immediately.
> Thank you.
>
>
> ------------------------------
>
> THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS
> CONFIDENTIAL and may contain information that is privileged and exempt from
> disclosure under applicable law. If you are neither the intended recipient
> nor responsible for delivering the message to the intended recipient,
> please note that any dissemination, distribution, copying or the taking of
> any action in reliance upon the message is strictly prohibited. If you have
> received this communication in error, please notify the sender immediately.
> Thank you.
>
>
> ------------------------------
> THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS
> CONFIDENTIAL and may contain information that is privileged and exempt from
> disclosure under applicable law. If you are neither the intended recipient
> nor responsible for delivering the message to the intended recipient,
> please note that any dissemination, distribution, copying or the taking of
> any action in reliance upon the message is strictly prohibited. If you have
> received this communication in error, please notify the sender immediately.
> Thank you.
>

Mime
View raw message