hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 <java8...@hotmail.com>
Subject RE: get_json_object for nested field returning a String instead of an Array
Date Tue, 08 Apr 2014 00:07:21 GMT
Hi, Narayanan:
The current problem is that for a generic solution, there is no way that we know that element
in the Json is an array. Keep in mind that in any element of Json, it could be any valid structure.
So it could be array, another structure, or map etc. 
You know your data, so you can say in this level, it is array. But computer doesn't know,
that is why you need to provide a schema.
Think about it, in programming, we can cast that to array, but normally that is NOT a good
solution, so for a generic solution like any hadoop json UDF, it will and should ask for a
schema.
For you case, if you know the data, it gets to be array, then write your own UDF to cast it
to an array, without any schema. But I don't think any good, generic Json UDFs will support
that for your case.
Yong

> Date: Mon, 7 Apr 2014 16:47:44 -0700
> Subject: Re: get_json_object for nested field returning a String instead of an Array
> From: knarayanan88@gmail.com
> To: user@hive.apache.org
> 
> Thanks Peyman.
> 
> Actually the problem with Hive-Json-Serde is that we need to provide
> the entire schema upfront while creating the table.
> 
> My requirement is that we just project/aggregate on the fields using
> get_json_object after creating the external table without schema. This
> way the external table is agnostic to any new schema changes.
> 
> Would love to get a solution for converting get_json_object to return
> an Array instead of a string.. Can we use any Hive UDFs to convert
> string into an explodable Array object ?
> 
> Thanks
> Narayanan
> 
> On Mon, Apr 7, 2014 at 4:14 PM, Peyman Mohajerian <mohajeri@gmail.com> wrote:
> > perhaps: https://github.com/rcongiu/Hive-JSON-Serde
> >
> >
> > On Mon, Apr 7, 2014 at 6:52 PM, Narayanan K <knarayanan88@gmail.com> wrote:
> >>
> >> Hi all
> >>
> >> I am using get_json_object to read a json text file. I have created
> >> the external table as below :
> >>
> >> CREATE EXTERNAL TABLE EXT_TABLE ( json string)
> >> PARTITIONED BY (dt string)
> >> LOCATION '/users/abc/';
> >>
> >>
> >> The json data has some fields that are not simple fields but fields
> >> which are nested fields like -  "field" : [{"id":1},{"id":2}.. ].
> >>
> >> While using the get_json_object to retrieve that field, it is
> >> returning back a string instead of an Array. Hence I am not able to
> >> explode the array as it is a string.
> >>
> >> Is there some way we can get an array of get_json_object instead of a
> >> string so that we can perform explode on this nested field ? or Anyway
> >> we can convert the string into an array so that I can use explode ?
> >>
> >> Thanks in advance,
> >> Narayanan
> >
> >
 		 	   		  
Mime
View raw message