hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sachin Goyal <sgo...@walmartlabs.com>
Subject Re: how to load json with nested array into hive?
Date Mon, 23 Jun 2014 16:57:09 GMT

You can also use hive-json-schema to automate Hive schema generation from JSON:
https://github.com/quux00/hive-json-schema


From: Nitin Pawar <nitinpawar432@gmail.com<mailto:nitinpawar432@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Monday, June 23, 2014 at 2:25 AM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Re: how to load json with nested array into hive?

I think you can just take a look at jsonserde

It does take care of nested json documents. (though you will need to know entire json structure
upfront)

Here is example of using it http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/




On Mon, Jun 23, 2014 at 2:28 PM, Christian Link <christian.link@mdmp.com<mailto:christian.link@mdmp.com>>
wrote:
Hi Jerome,

thanks...I've already found "Brickhouse" and the Hive UDFs, but it didn't help.

Today I'll try again to process the json file after going through all my mails...maybe I'll
find a solution.

Best,
Chris


On Fri, Jun 20, 2014 at 7:16 PM, Jerome Banks <jbanks@tagged.com<mailto:jbanks@tagged.com>>
wrote:
Christian,
   Sorry to spam this newsgroup, and this is not a commercial endorsement, but check out the
Hive UDFs in the Brickhouse project ( http://github.com/klout/brickhouse ) ( http://brickhouseconfessions.wordpress.com/2014/02/07/hive-and-json-made-simple/
)

You can convert arbitrary complex Hive structures to an from json with it's to_json and from_json
UDF's. See the blog posting for an explanation.

-- jerome


On Fri, Jun 20, 2014 at 8:26 AM, Christian Link <christian.link@mdmp.com<mailto:christian.link@mdmp.com>>
wrote:
hi,

I'm very, very new to Hadoop, Hive, etc. and I have to import data into hive tables.

Environment: Amazon EMR, S3, etc.

The input file is on S3 and I copied it into my HDFS.

1. flat table with one column and loaded data into it:

  CREATE TABLE mdmp_raw_data (json_record STRING);
  LOAD DATA INPATH 'hdfs:///input-api/1403181319.json' OVERWRITE INTO TABLE `mdmp_raw_data`;
That worked, I can access some data, like this:

SELECT d.carrier, d.language, d.country
  FROM mdmp_raw_data a LATERAL VIEW json_tuple(a.data, 'requestTimestamp', 'context') b  
 AS requestTimestamp, context
  LATERAL VIEW json_tuple(b.context, 'locale') c AS locale
  LATERAL VIEW json_tuple(c.locale, 'carrier', 'language', 'country') d AS carrier, language,
country
  LIMIT 1;

Result: o2 - de Deutsch Deutschland

I can also select the array at once:

SELECT b.requestTimestamp, b.batch
  FROM mdmp_raw_data a
  LATERAL VIEW json_tuple(a.data, 'requestTimestamp', 'batch') b AS requestTimestamp, batch
  LIMIT 1;
This will give me:

 [{"timestamp":"2014-06-19T14:25:18+02:00","requestId":"2ca08247-5542-4cb4-be7e-4a8574fb77a8","sessionId":"f29ec175ca6b7d10","event":"TEST
Doge Comments","userId":"doge96514016ruffruff","action":"track","context":{"library":"analytics-android","libraryVersion":"0.6.13"},"properties":{"comment":"Much
joy."}}, ...]

This "batch" may contain n events will a structure like above.

I want to put all events in a table where each "element" will be stored in a unique column:
timestamp, requestId, sessionId, event, userId, action, context, properties

2. explode the "batch" I read a lot about SerDe, etc. - but I don't get it.

- I tried to create a table with an array and load the data into it - several errors
use explode in query but it doesn't accept "batch" as array
- integrated several SerDes but get things like "unknown function jspilt"
- I'm lost in too many documents, howtos, etc. and could need some advices...

Thank you in advance!

Best, Chris





--
Nitin Pawar

Mime
View raw message