hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerome Banks <jba...@tagged.com>
Subject Re: how to load json with nested array into hive?
Date Fri, 20 Jun 2014 17:16:29 GMT
Christian,
   Sorry to spam this newsgroup, and this is not a commercial endorsement,
but check out the Hive UDFs in the Brickhouse project (
http://github.com/klout/brickhouse ) (
http://brickhouseconfessions.wordpress.com/2014/02/07/hive-and-json-made-simple/
)

You can convert arbitrary complex Hive structures to an from json with it's
to_json and from_json UDF's. See the blog posting for an explanation.

-- jerome


On Fri, Jun 20, 2014 at 8:26 AM, Christian Link <christian.link@mdmp.com>
wrote:

> hi,
>
> I'm very, very new to Hadoop, Hive, etc. and I have to import data into
> hive tables.
>
> Environment: Amazon EMR, S3, etc.
>
> The input file is on S3 and I copied it into my HDFS.
>
> 1. flat table with one column and loaded data into it:
>
>   CREATE TABLE mdmp_raw_data (json_record STRING);
>   LOAD DATA INPATH 'hdfs:///input-api/1403181319.json' OVERWRITE INTO
> TABLE `mdmp_raw_data`;
> That worked, I can access some data, like this:
>
> SELECT d.carrier, d.language, d.country
>   FROM mdmp_raw_data a LATERAL VIEW json_tuple(a.data, 'requestTimestamp',
> 'context') b    AS requestTimestamp, context
>   LATERAL VIEW json_tuple(b.context, 'locale') c AS locale
>   LATERAL VIEW json_tuple(c.locale, 'carrier', 'language', 'country') d AS
> carrier, language, country
>   LIMIT 1;
>
> Result: o2 - de Deutsch Deutschland
>
> I can also select the array at once:
>
> SELECT b.requestTimestamp, b.batch
>   FROM mdmp_raw_data a
>   LATERAL VIEW json_tuple(a.data, 'requestTimestamp', 'batch') b AS
> requestTimestamp, batch
>   LIMIT 1;
> This will give me:
>
>  [{"timestamp":"2014-06-19T14:25:18+02:00","requestId":"2ca08247-5542-4cb4-be7e-4a8574fb77a8","sessionId":"f29ec175ca6b7d10","event":"TEST
> Doge
> Comments","userId":"doge96514016ruffruff","action":"track","context":{"library":"analytics-android","libraryVersion":"0.6.13"},"properties":{"comment":"Much
> joy."}}, ...]
>
> This "batch" may contain n events will a structure like above.
>
> I want to put all events in a table where each "element" will be stored in
> a unique column: timestamp, requestId, sessionId, event, userId, action,
> context, properties
>
> 2. explode the "batch" I read a lot about SerDe, etc. - but I don't get it.
>
> - I tried to create a table with an array and load the data into it -
> several errors
> use explode in query but it doesn't accept "batch" as array
> - integrated several SerDes but get things like "unknown function jspilt"
> - I'm lost in too many documents, howtos, etc. and could need some
> advices...
>
> Thank you in advance!
>
> Best, Chris
>

Mime
View raw message