hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Joe <games2013....@gmail.com>
Subject Re: Using json_tuple for Nested json Arrays
Date Tue, 27 Oct 2015 20:25:19 GMT
Hi Ryan,

The simple query is running fine as shown below:

hive>   SELECT tr2.id, tr2.possibly_sensitive
    >   FROM tweets_raw tr1
    >   LATERAL VIEW json_tuple(tr1.text_col, 'id', 'extended_entities',
'possibly_sensitive') tr2 as id, extended_entities, possibly_sensitive
    >   where tr2.id='654395184428515332'
    >   LIMIT 1;
OK
654395184428515332      false
Time taken: 1.813 seconds, Fetched: 1 row(s)
hive>


However, if I try to get any data from the json array it's failing.

Thanks,
Joel

On Tue, Oct 27, 2015 at 4:21 PM, Ryan Harris <Ryan.Harris@zionsbancorp.com>
wrote:

> looking at your sample data, you shouldn't need to use lateral view
> explode unless you are trying to get 1 entry per row for  your media sizes
> (thumb, small, large, medium, etc) ...
>
>
>
> Try starting with something simple like :
>
>
>
> SELECT get_json_object(text_col, '$.id') as id FROM tweets_raw limit 10;
>
>
>
> You should also be able to use json_tuple(), but start simple....
>
>
>
> *From:* Sam Joe [mailto:games2013.sam@gmail.com]
> *Sent:* Tuesday, October 27, 2015 1:43 PM
>
> *To:* user@hive.apache.org
> *Subject:* Re: Using json_tuple for Nested json Arrays
>
>
>
> Hi Ryan,
>
>
>
> Thanks for your reply! I didn't try using  get_json_object() UDF. I will
> try using that and let you know the results.
>
>
>
> I tried using the following script which failed :
>
>
>
>   SELECT tr2.id, tr2.possibly_sensitive, tr3.media,
> media_object.source_user_id
>
>   FROM tweets_raw tr1
>
>   LATERAL VIEW json_tuple(tr1.text_col, 'id', 'extended_entities',
> 'possibly_sensitive') tr2 as id, extended_entities, possibly_sensitive
>
>   LATERAL VIEW json_tuple(tr2.extended_entities, 'media') tr3 as media
>
>   LATERAL VIEW EXPLODE(tr3.media) media_exploded as media_object
>
>   where tr2.id='654395184428515332'
>
>   LIMIT 1;
>
>
>
> FAILED: UDFArgumentException explode() takes an array or a map as a
> parameter
>
>
>
>
>
> Thanks,
>
> Joel
>
>
>
> On Tue, Oct 27, 2015 at 3:37 PM, Ryan Harris <Ryan.Harris@zionsbancorp.com>
> wrote:
>
> Do you have an example of the query that you tried (which failed).
>
> In short, you probably want to use the get_json_object() UDF:
>
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-get_json_object
>
>
>
> if you need the JSON array broken into individual records, you might
> require lateral view explode as in these examples:
>
>
> http://mechanics.flite.com/blog/2014/04/16/using-explode-and-lateral-view-in-hive/
>
>
> http://stackoverflow.com/questions/28716165/how-to-query-struct-array-with-hive-get-json-object
>
>
>
>
>
> *From:* Sam Joe [mailto:games2013.sam@gmail.com]
> *Sent:* Tuesday, October 27, 2015 1:29 PM
> *To:* user@hive.apache.org
> *Subject:* Re: Using json_tuple for Nested json Arrays
>
>
>
> I tried using EXPLODE function on the nested json array but it doesn't
> work and throws following error:
>
>
>
> FAILED: UDFArgumentException explode() takes an array or a map as a
> parameter
>
>
>
>
>
> Thanks,
>
> Joel
>
>
>
> On Tue, Oct 27, 2015 at 3:20 PM, Sam Joe <games2013.sam@gmail.com> wrote:
>
> Hi,
>
>
>
> Is it possible to use json_tuple function to extract data from json arrays
> (nested too). I am trying to process json data as string and avoid using
> serdes since user data may be malformed.
>
>
>
> Please see a sample json data given below:
>
>
>
>
>
> {
>
> "filter_level": "low",
>
> "retweeted": false,
>
> "in_reply_to_screen_name": null,
>
> "possibly_sensitive": false,
>
> "truncated": false,
>
> "lang": "en",
>
> "in_reply_to_status_id_str": null,
>
> "id": 654395184428515332,
>
> "extended_entities": {
>
> "media": [{
>
> "sizes": {
>
> "thumb": {
>
> "w": 150,
>
> "resize": "crop",
>
> "h": 150
>
> },
>
> "small": {
>
> "w": 340,
>
> "resize": "fit",
>
> "h": 255
>
> },
>
> "large": {
>
> "w": 1024,
>
> "resize": "fit",
>
> "h": 768
>
> },
>
> "medium": {
>
> "w": 600,
>
> "resize": "fit",
>
> "h": 450
>
> }
>
> },
>
> "source_user_id": 16864598,
>
> "media_url": "http://pbs.twimg.com/media/CRSL2MPWsAAOnZo.jpg",
>
> "display_url": "pic.twitter.com/i3004WyF4g",
>
> "type": "photo",
>
> "url": "http://t.co/i3004WyF4g",
>
> "id": 654301608990388224,
>
> "media_url_https": "https://pbs.twimg.com/media/CRSL2MPWsAAOnZo.jpg",
>
> "expanded_url": "
> http://twitter.com/lordlancaster/status/654301626665189376/photo/1",
>
> "source_user_id_str": "16864598",
>
> "indices": [143,
>
> 144],
>
> "source_status_id_str": "654301626665189376",
>
> "source_status_id": 654301626665189376,
>
> "id_str": "654301608990388224"
>
> },
>
> {
>
> "sizes": {
>
> "thumb": {
>
> "w": 150,
>
> "resize": "crop",
>
> "h": 150
>
> },
>
> "small": {
>
> "w": 340,
>
> "resize": "fit",
>
> "h": 255
>
> },
>
> "large": {
>
> "w": 1024,
>
> "resize": "fit",
>
> "h": 768
>
> },
>
> "medium": {
>
> "w": 600,
>
> "resize": "fit",
>
> "h": 450
>
> }
>
> },
>
> "source_user_id": 16864598,
>
> "media_url": "http://pbs.twimg.com/media/CRSL2MRWgAAGOcj.jpg",
>
> "display_url": "pic.twitter.com/i3004WyF4g",
>
> "type": "photo",
>
> "url": "http://t.co/i3004WyF4g",
>
> "id": 654301608998764544,
>
> "media_url_https": "https://pbs.twimg.com/media/CRSL2MRWgAAGOcj.jpg",
>
> "expanded_url": "
> http://twitter.com/lordlancaster/status/654301626665189376/photo/1",
>
> "source_user_id_str": "16864598",
>
> "indices": [143,
>
> 144],
>
> "source_status_id_str": "654301626665189376",
>
> "source_status_id": 654301626665189376,
>
> "id_str": "654301608998764544"
>
> },
>
> {
>
> "sizes": {
>
> "thumb": {
>
> "w": 150,
>
> "resize": "crop",
>
> "h": 150
>
> },
>
> "small": {
>
> "w": 340,
>
> "resize": "fit",
>
> "h": 255
>
> },
>
> "large": {
>
> "w": 1024,
>
> "resize": "fit",
>
> "h": 768
>
> },
>
> "medium": {
>
> "w": 600,
>
> "resize": "fit",
>
> "h": 450
>
> }
>
> },
>
> "source_user_id": 16864598,
>
> "media_url": "http://pbs.twimg.com/media/CRSL2MQWwAAP4Qo.jpg",
>
> "display_url": "pic.twitter.com/i3004WyF4g",
>
> "type": "photo",
>
> "url": "http://t.co/i3004WyF4g",
>
> "id": 654301608994586624,
>
> "media_url_https": "https://pbs.twimg.com/media/CRSL2MQWwAAP4Qo.jpg",
>
> "expanded_url": "
> http://twitter.com/lordlancaster/status/654301626665189376/photo/1",
>
> "source_user_id_str": "16864598",
>
> "indices": [143,
>
> 144],
>
> "source_status_id_str": "654301626665189376",
>
> "source_status_id": 654301626665189376,
>
> "id_str": "654301608994586624"
>
> },
>
> {
>
> "sizes": {
>
> "thumb": {
>
> "w": 150,
>
> "resize": "crop",
>
> "h": 150
>
> },
>
> "small": {
>
> "w": 340,
>
> "resize": "fit",
>
> "h": 255
>
> },
>
> "large": {
>
> "w": 1024,
>
> "resize": "fit",
>
> "h": 768
>
> },
>
> "medium": {
>
> "w": 600,
>
> "resize": "fit",
>
> "h": 450
>
> }
>
> },
>
> "source_user_id": 16864598,
>
> "media_url": "http://pbs.twimg.com/media/CRSL2M8WcAEXowZ.jpg",
>
> "display_url": "pic.twitter.com/i3004WyF4g",
>
> "type": "photo",
>
> "url": "http://t.co/i3004WyF4g",
>
> "id": 654301609179115521,
>
> "media_url_https": "https://pbs.twimg.com/media/CRSL2M8WcAEXowZ.jpg",
>
> "expanded_url": "
> http://twitter.com/lordlancaster/status/654301626665189376/photo/1",
>
> "source_user_id_str": "16864598",
>
> "indices": [143,
>
> 144],
>
> "source_status_id_str": "654301626665189376",
>
> "source_status_id": 654301626665189376,
>
> "id_str": "654301609179115521"
>
> }]
>
> }
>
> }
>
>
>
> Appreciate any help!
>
>
>
> *Thanks,*
>
> *Joel*
>
>
>
>
>
>
> ------------------------------
>
> THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS
> CONFIDENTIAL and may contain information that is privileged and exempt from
> disclosure under applicable law. If you are neither the intended recipient
> nor responsible for delivering the message to the intended recipient,
> please note that any dissemination, distribution, copying or the taking of
> any action in reliance upon the message is strictly prohibited. If you have
> received this communication in error, please notify the sender immediately.
> Thank you.
>
>
> ------------------------------
> THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS
> CONFIDENTIAL and may contain information that is privileged and exempt from
> disclosure under applicable law. If you are neither the intended recipient
> nor responsible for delivering the message to the intended recipient,
> please note that any dissemination, distribution, copying or the taking of
> any action in reliance upon the message is strictly prohibited. If you have
> received this communication in error, please notify the sender immediately.
> Thank you.
>

Mime
View raw message