Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C9F371106F for ; Mon, 23 Jun 2014 17:23:56 +0000 (UTC) Received: (qmail 81370 invoked by uid 500); 23 Jun 2014 17:23:55 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 81296 invoked by uid 500); 23 Jun 2014 17:23:55 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 81280 invoked by uid 99); 23 Jun 2014 17:23:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Jun 2014 17:23:54 +0000 X-ASF-Spam-Status: No, hits=2.6 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,URIBL_GREY X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [74.125.82.173] (HELO mail-we0-f173.google.com) (74.125.82.173) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Jun 2014 17:23:51 +0000 Received: by mail-we0-f173.google.com with SMTP id t60so7395897wes.32 for ; Mon, 23 Jun 2014 10:23:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=A/UZf2u7Rl6jbKC+ROffxkDFdDocJjTnGIhgmcw/gKA=; b=eTz7qgvJmt/+DDRLGqCXaNaOv8MSy80hD5gRvGAYMwD/nU/8paAfr3uWSvApG+aVNX YMnad10XONzGs1h5V9t4M0ybJD4PXNEOVGI3FTapVR4pE1muAMWjz8bPcuftIpKKQDTE h9i+aOOHS1m4iFoUKirfiaK8ONmHxbDcGuFeqmyRZu/sPOySLMTET1CljBX4K/esjDfg JyEQ8/QAlOwDC3GoxcYjOAOLg+DrKFSElxVgBTR2H6v96ln1eSXiVV9K2Q/DAB+SxL8a eL7x9Bk0nO5t9yoCfGFYt5FvgyenFIdkIvR3PivqBjjmkEBYAknIZJ5j7Lzj7+EKd53x 6ZTQ== X-Gm-Message-State: ALoCoQngG437LYEVxzyi0Ab691yAZpFf2EcQ+w4V3Xo15b4uoao1pq+2npf3CS82vBbeVJpA6+IK MIME-Version: 1.0 X-Received: by 10.194.11.74 with SMTP id o10mr29666707wjb.82.1403544209536; Mon, 23 Jun 2014 10:23:29 -0700 (PDT) Received: by 10.217.162.132 with HTTP; Mon, 23 Jun 2014 10:23:29 -0700 (PDT) In-Reply-To: References: Date: Mon, 23 Jun 2014 19:23:29 +0200 Message-ID: Subject: Re: how to load json with nested array into hive? From: Christian Link To: user@hive.apache.org Content-Type: multipart/alternative; boundary=047d7b472a1009512b04fc841a55 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b472a1009512b04fc841a55 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, thanks...but I need to sort things out with ONE SerDe/strategy... I've started with Andr=C3=A9's idea by using Roberto Congiu's SerDe and And= r=C3=A9's template to create a table with the right schema and loading the data aftrerwards. But it's not completely working... I did the following (sorry for spaming...): 1. create table and load data -- create database (if not exists) CREATE DATABASE IF NOT EXISTS mdmp_api_dump; -- connect to database; USE mdmp_api_dump; -- add SerDE for json processing ADD JAR /home/hadoop/lib/hive/json-serde-1.1.4-jar-with-dependencies.jar; -- drop old raw data DROP TABLE IF EXISTS mdmp_raw_data; -- create raw data table CREATE TABLE mdmp_raw_data ( action string, batch array< struct< timestamp:string, traits:map, requestId:string, sessionId:string, event:string, userId:string, action:string, context:map, properties:map > >, context struct< build:map, device:struct< brand:string, manufacturer:string, model:string, release:string, sdk:int >, display:struct< density:double, height:int, width:int >, integrations:map, library:string, libraryVersion:string, locale:map, location:map, telephony:map, wifi:map >, received_at string, requestTimestamp string, writeKey string ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS TEXTFILE; -- load data LOAD DATA INPATH 'hdfs:///input-api/1403181319.json' OVERWRITE INTO TABLE `mdmp_raw_data`; 2. run query against the "raw data" and create "formatted table": ADD JAR /home/hadoop/lib/hive/json-serde-1.1.4-jar-with-dependencies.jar; USE mdmp_api_dump; DROP TABLE IF EXISTS mdmp_api_data; CREATE TABLE mdmp_api_data AS SELECT DISTINCT a.action, a.received_at, a.requestTimestamp, a.writeKey, a.context.device.brand as brand, a.context.device.manufacturer as manufacturer, a.context.device.model as model, a.context.device.release as release, a.context.device.sdk as sdk, -- a.context.display.density as density, a.context.display.height as height, a.context.display.width as width, a.context.telephony['radio'] as tel_radio, a.context.telephony['carrier'] as tel_carrier, a.context.wifi['connected'] as wifi_connected, a.context.wifi['available'] as wifi_available, a.context.locale['carrier'] as loce_carrier, a.context.locale['language'] as loce_language, a.context.locale['country'] as loce_country, a.context.integrations['Tapstream'] as int_tapstream, a.context.integrations['Amplitude'] as int_amplitude, a.context.integrations['Localytics'] as int_localytics, a.context.integrations['Flurry'] as int_flurry, a.context.integrations['Countly'] as int_countly, a.context.integrations['Quantcast'] as int_quantcast, a.context.integrations['Crittercism'] as int_crittercism, a.context.integrations['Google Analytics'] as int_googleanalytics, a.context.integrations['Mixpanel'] as int_mixpanel, b.batch.action AS b_action, b.batch.context, b.batch.event, b.batch.properties, b.batch.requestId, b.batch.sessionId, b.batch.timestamp, b.batch.traits, b.batch.userId FROM mdmp_raw_data a LATERAL VIEW explode(a.batch) b AS batch; So far so good... (besides a silly double/int bug in the outdated SerDe) I thought. But it turned out, that some fields are NULL - within all records. Affected fields are: b.batch.event, b.batch.requestId, b.batch.sessionId, b.batch.userId I can see values in the json file, but neither in the "raw table" nor in the final table...that's really strange. An example record: {"requestTimestamp":"2014-06-19T14:25:26+02:00","context":{"libraryVersion"= :"0.6.13","telephony":{"radio":"gsm","carrier":"o2 - de"},"wifi":{"connected":true,"available":true},"location":{},"locale":{"ca= rrier":"o2 - de","language":"Deutsch","country":"Deutschland"},"library":"analytics-andr= oid","device":{"brand":"htc","model":"HTC One S","sdk":16,"release":"4.1.1","manufacturer":"HTC"},"display":{"density":1.= 5,"width":540,"height":960},"build":{"name":"1.0","code":1},"integrations":= {"Tapstream":false,"Amplitude":false,"Localytics":false,"Flurry":false,"Cou= ntly":false,"Bugsnag":false,"Quantcast":false,"Crittercism":false,"Google Analytics":false,"Mixpanel":false}},"batch":[{"timestamp":"2014-06-19T14:25= :17+02:00","requestId":"32377337-3f99-4ac5-bfc6-d3654584655b","sessionId":"= 75cd18db8a364c2","event":"TEST Doge Comments","userId":"doge74167705ruffruff","action":"track","context":{"libr= ary":"analytics-android","libraryVersion":"0.6.13"},"properties":{"comment"= :"Ruff ruff!"}},{"timestamp":"2014-06-19T14:25:18+02:00","requestId":"fbfd45c9-cf0= f-4cb3-955c-85c65220a5bd","sessionId":"75cd18db8a364c2","event":"TEST Doge Purchase","userId":"doge74167705ruffruff","action":"track","context":{"libr= ary":"analytics-android","libraryVersion":"0.6.13"},"properties":{"revenue"= :"0,08"}},{"timestamp":"2014-06-19T14:25:18+02:00","requestId":"3a643b12-64= e5-4a7c-b44b-e3e09dbc5b66","sessionId":"75cd18db8a364c2","event":"TEST Doge Comments","userId":"doge74167705ruffruff","action":"track","context":{"libr= ary":"analytics-android","libraryVersion":"0.6.13"},"properties":{"comment"= :"Wow..."}},{"action":"identify","context":{"library":"analytics-android","= libraryVersion":"0.6.13"},"timestamp":"2014-06-19T14:25:19+02:00","traits":= {"email":" doges@mdmp.com","name":"Carmelo Doge"},"requestId":"ef2910f4-cd4f-4175-89d0-7d91b35c229f","sessionId":"75cd= 18db8a364c2","userId":"doge74167705ruffruff"},{"timestamp":"2014-06-19T14:2= 5:19+02:00","requestId":"1676bb06-abee-4135-a206-d57c4a1bc24d","sessionId":= "75cd18db8a364c2","event":"TEST Doge App Usage","userId":"doge74167705ruffruff","action":"track","context":{"library= ":"analytics-android","libraryVersion":"0.6.13"},"properties":{}},{"timesta= mp":"2014-06-19T14:25:20+02:00","requestId":"66532c8a-c5da-4852-b8b6-04df8f= 3052d5","sessionId":"75cd18db8a364c2","event":"TEST Doge Comments","userId":"doge74167705ruffruff","action":"track","context":{"libr= ary":"analytics-android","libraryVersion":"0.6.13"},"properties":{"comment"= :"Many data."}},{"timestamp":"2014-06-19T14:25:21+02:00","requestId":"a1a79d8c-fe5= 8-4567-8dec-a8d1d2ae2713","sessionId":"75cd18db8a364c2","event":"TEST Doge Purchase","userId":"doge74167705ruffruff","action":"track","context":{"libr= ary":"analytics-android","libraryVersion":"0.6.13"},"properties":{"revenue"= :"0,87"}},{"timestamp":"2014-06-19T14:25:21+02:00","requestId":"259209ac-b1= 35-4d5f-bdac-535eccc0400e","sessionId":"75cd18db8a364c2","event":"TEST Doge Comments","userId":"doge74167705ruffruff","action":"track","context":{"libr= ary":"analytics-android","libraryVersion":"0.6.13"},"properties":{"comment"= :"Wow..."}},{"timestamp":"2014-06-19T14:25:23+02:00","requestId":"59b6d57c-= c7a5-4b2a-af6d-fa10ae0de60c","sessionId":"75cd18db8a364c2","event":"TEST Doge Comments","userId":"doge74167705ruffruff","action":"track","context":{"libr= ary":"analytics-android","libraryVersion":"0.6.13"},"properties":{"comment"= :"Such App!"}},{"timestamp":"2014-06-19T14:25:24+02:00","requestId":"8b05226f-bdf5= -4af8-bb91-84da1b874c6e","sessionId":"75cd18db8a364c2","event":"TEST Doge Purchase","userId":"doge74167705ruffruff","action":"track","context":{"libr= ary":"analytics-android","libraryVersion":"0.6.13"},"properties":{"revenue"= :"0,50"}},{"timestamp":"2014-06-19T14:25:24+02:00","requestId":"0f366675-56= 41-4238-b2a9-176735de6edd","sessionId":"75cd18db8a364c2","event":"TEST Doge Comments","userId":"doge74167705ruffruff","action":"track","context":{"libr= ary":"analytics-android","libraryVersion":"0.6.13"},"properties":{"comment"= :"Ruff ruff!"}},{"timestamp":"2014-06-19T14:25:26+02:00","requestId":"9e832534-511= 4-4ec1-bc20-1dcf1c354d0c","sessionId":"75cd18db8a364c2","event":"Session end","userId":"doge74167705ruffruff","action":"track","context":{"library":= "analytics-android","libraryVersion":"0.6.13"},"properties":{"start":"14:25= :09","end":"14:25:26"}}],"writeKey":"a8RCFSAVjmT5qyxLKMzt12kcXWOIusvw","act= ion":"import","received_at":"2014-06-19T12:25:29.790+00:00"} Funny thing is, that I'm sure that I've seen these values earlier today...I've reloaded the data/tables several times to see if this is still working...well. :) I'm gonna stop for today...another try tomorrow. Thanks so far and many greetings from Berlin, Chris On Mon, Jun 23, 2014 at 6:57 PM, Sachin Goyal wrote: > > You can also use hive-json-schema to automate Hive schema generation from > JSON: > https://github.com/quux00/hive-json-schema > > > From: Nitin Pawar >> > Reply-To: "user@hive.apache.org" < > user@hive.apache.org> > Date: Monday, June 23, 2014 at 2:25 AM > To: "user@hive.apache.org" < > user@hive.apache.org> > Subject: Re: how to load json with nested array into hive? > > I think you can just take a look at jsonserde > > It does take care of nested json documents. (though you will need to know > entire json structure upfront) > > Here is example of using it > http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/ > > > > > On Mon, Jun 23, 2014 at 2:28 PM, Christian Link > wrote: > Hi Jerome, > > thanks...I've already found "Brickhouse" and the Hive UDFs, but it didn't > help. > > Today I'll try again to process the json file after going through all my > mails...maybe I'll find a solution. > > Best, > Chris > > > On Fri, Jun 20, 2014 at 7:16 PM, Jerome Banks jbanks@tagged.com>> wrote: > Christian, > Sorry to spam this newsgroup, and this is not a commercial endorsement= , > but check out the Hive UDFs in the Brickhouse project ( > http://github.com/klout/brickhouse ) ( > http://brickhouseconfessions.wordpress.com/2014/02/07/hive-and-json-made-= simple/ > ) > > You can convert arbitrary complex Hive structures to an from json with > it's to_json and from_json UDF's. See the blog posting for an explanation= . > > -- jerome > > > On Fri, Jun 20, 2014 at 8:26 AM, Christian Link > wrote: > hi, > > I'm very, very new to Hadoop, Hive, etc. and I have to import data into > hive tables. > > Environment: Amazon EMR, S3, etc. > > The input file is on S3 and I copied it into my HDFS. > > 1. flat table with one column and loaded data into it: > > CREATE TABLE mdmp_raw_data (json_record STRING); > LOAD DATA INPATH 'hdfs:///input-api/1403181319.json' OVERWRITE INTO > TABLE `mdmp_raw_data`; > That worked, I can access some data, like this: > > SELECT d.carrier, d.language, d.country > FROM mdmp_raw_data a LATERAL VIEW json_tuple(a.data, 'requestTimestamp'= , > 'context') b AS requestTimestamp, context > LATERAL VIEW json_tuple(b.context, 'locale') c AS locale > LATERAL VIEW json_tuple(c.locale, 'carrier', 'language', 'country') d A= S > carrier, language, country > LIMIT 1; > > Result: o2 - de Deutsch Deutschland > > I can also select the array at once: > > SELECT b.requestTimestamp, b.batch > FROM mdmp_raw_data a > LATERAL VIEW json_tuple(a.data, 'requestTimestamp', 'batch') b AS > requestTimestamp, batch > LIMIT 1; > This will give me: > > [{"timestamp":"2014-06-19T14:25:18+02:00","requestId":"2ca08247-5542-4cb= 4-be7e-4a8574fb77a8","sessionId":"f29ec175ca6b7d10","event":"TEST > Doge > Comments","userId":"doge96514016ruffruff","action":"track","context":{"li= brary":"analytics-android","libraryVersion":"0.6.13"},"properties":{"commen= t":"Much > joy."}}, ...] > > This "batch" may contain n events will a structure like above. > > I want to put all events in a table where each "element" will be stored i= n > a unique column: timestamp, requestId, sessionId, event, userId, action, > context, properties > > 2. explode the "batch" I read a lot about SerDe, etc. - but I don't get i= t. > > - I tried to create a table with an array and load the data into it - > several errors > use explode in query but it doesn't accept "batch" as array > - integrated several SerDes but get things like "unknown function jspilt" > - I'm lost in too many documents, howtos, etc. and could need some > advices... > > Thank you in advance! > > Best, Chris > > > > > > -- > Nitin Pawar > --047d7b472a1009512b04fc841a55 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

thanks...but I need to sort things = out with ONE SerDe/strategy...
I've started with Andr=C3=A9&#= 39;s idea by using=C2=A0Roberto Congiu's SerDe and Andr=C3=A9's tem= plate to create a table with the right schema and loading the data aftrerwa= rds.

But it's not completely working...

I did the following (sorry for spaming...):

1. create table and load data

-- create da= tabase (if not exists)
CREATE DATABASE IF NOT EXISTS md= mp_api_dump;

-- connect to database= ;
USE mdmp_api_dump;
<= div>
-- add SerDE for json processing
ADD JAR /home/hadoop/lib/hive/json-se= rde-1.1.4-jar-with-dependencies.jar;

DROP TABLE IF EXISTS mdmp_raw_da= ta;

-- create raw data table=
CREATE TABLE mdmp_raw_data (
=C2=A0 action string,
=C2=A0 batch array<= ;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 struct<
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 timestamp:string,
<= font face=3D"courier new, monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 traits:map<string,string>,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 requestId:string,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sessionId:string,=
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 event:string,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 userId:string,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 action:string,
=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 context:map<string,string>, =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 properties:map<string,string>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0=
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 >
=C2=A0 =C2=A0 =C2=A0 =C2=A0 >= ,
=C2=A0 context str= uct<
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 build:map<string,string>,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 device:struct<
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0brand:string,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0manufacturer:string,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0model:string,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0release:string,
<= font face=3D"courier new, monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0sdk:int
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0>,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 display:s= truct<
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 density:= double,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 height:int,
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 width:int
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 >,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 integrations:map<string,boolean>,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 li= brary:string,
=C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 libraryVersion:string,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 locale:map<string,string>,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 location:= map<string,string>,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 telephony:map<string,str= ing>,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 wifi:map<string,boolean>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >,
=
=C2=A0 received_at string,
=C2=A0 requestTimestamp string,<= /font>
=C2=A0 writeKey stri= ng
)
ROW FORMAT SERDE 'org.openx.dat= a.jsonserde.JsonSerDe'
STORED AS TEXTFILE;
=

-- load data
LOAD DATA INPATH 'hdfs:///input-api/1403181319.jso= n' OVERWRITE INTO TABLE `mdmp_raw_data`;

2. run query against the "raw data" and= create "formatted table":

ADD JAR /home/hadoop/lib/hive/json-serde-1.1.= 4-jar-with-dependencies.jar;

USE mdmp_api_dump;

DROP TABLE IF EXISTS mdmp_api_data;

CREATE TABLE mdmp_api_data AS=C2=A0
SELECT DISTINCT
=C2=A0 a.action,
=C2=A0 a.received_at,=C2=A0
=C2=A0 a.requestTimestam= p,=C2=A0
=C2=A0 a.wr= iteKey,
=C2=A0 a.con= text.device.brand as brand,
=C2=A0 a.context.device.manufact= urer as manufacturer,
=C2=A0 a.context.device.release as release,
=C2=A0 a.context.device.sdk as s= dk,
-- =C2=A0a.conte= xt.display.density as density,
=C2=A0 a.context.display.height as height,
=C2=A0 a.context.display.width a= s width,
=C2=A0 a.co= ntext.telephony['radio'] as tel_radio,
=C2=A0 a.context.telephony['carrier'] a= s tel_carrier,
=C2=A0 a.context.wifi['conne= cted'] as wifi_connected,
=C2=A0 a.context.wifi['available'] as wifi_available,
=C2=A0 a.context.locale['carrier&= #39;] as loce_carrier,
=C2=A0 a.context.locale['language'] as loce_language,
=C2=A0 a.context.locale['c= ountry'] as loce_country, =C2=A0
=C2=A0 a.context.integrations[&#= 39;Tapstream'] as int_tapstream,
=C2=A0 a.context.integrations['Amplitude'] as int_= amplitude,
=C2=A0 a.context.integrations[&#= 39;Localytics'] as int_localytics,
=C2=A0 a.context.integrations['Flurry'] as int_f= lurry,
=C2=A0 a.context.integrations[&#= 39;Countly'] as int_countly,
=C2=A0 a.context.integrations['Quantcast'] as int_quan= tcast,
=C2=A0 a.context.integrations[&#= 39;Crittercism'] as int_crittercism,
=C2=A0 a.context.integrations['Google Analytics= 9;] as int_googleanalytics,
=C2=A0 a.context.integrations[&#= 39;Mixpanel'] as int_mixpanel,
=C2=A0 b.batch.action AS b_action,=C2=A0
=C2=A0 b.batch.context,=C2=A0
=C2=A0 b.batch.event,=C2=A0
=C2=A0 b.batch.propertie= s,=C2=A0
=C2=A0 b.ba= tch.requestId,=C2=A0
=C2=A0 b.batch.sessionId,=C2=A0<= /font>
=C2=A0 b.batch.times= tamp,=C2=A0
=C2=A0 b= .batch.traits,=C2=A0
=C2=A0 b.batch.userId
FROM mdmp_raw_data a
LATERAL VIEW explode(a.batch)= b AS batch;

So far so good... (besides a silly double/int bug= in the outdated SerDe)
I thought.

But i= t turned out, that some fields are NULL - within all records.

Affected fields are:=C2=A0
=C2=A0 b.batch.event,
=
=C2=A0 b.batch.requestId,= =C2=A0
=C2=A0 b.batch.sessionId,=C2=A0
=C2=A0 b.batch.userId

I can see values in the json file, but neither =C2=A0in the &quo= t;raw table" nor in the final table...that's really strange.
=

An example record:
{"requestTimestamp&qu= ot;:"2014-06-19T14:25:26+02:00","context":{"librar= yVersion":"0.6.13","telephony":{"radio":= "gsm","carrier":"o2 - de"},"wifi":{= "connected":true,"available":true},"location"= :{},"locale":{"carrier":"o2 - de","langu= age":"Deutsch","country":"Deutschland"},= "library":"analytics-android","device":{"= ;brand":"htc","model":"HTC One S","= sdk":16,"release":"4.1.1","manufacturer"= :"HTC"},"display":{"density":1.5,"width&= quot;:540,"height":960},"build":{"name":"= ;1.0","code":1},"integrations":{"Tapstream&qu= ot;:false,"Amplitude":false,"Localytics":false,"Fl= urry":false,"Countly":false,"Bugsnag":false,"= Quantcast":false,"Crittercism":false,"Google Analytics&= quot;:false,"Mixpanel":false}},"batch":[{"timestam= p":"2014-06-19T14:25:17+02:00","requestId":"3= 2377337-3f99-4ac5-bfc6-d3654584655b","sessionId":"75cd1= 8db8a364c2","event":"TEST Doge Comments","use= rId":"doge74167705ruffruff","action":"track&q= uot;,"context":{"library":"analytics-android"= ,"libraryVersion":"0.6.13"},"properties":{&qu= ot;comment":"Ruff ruff!"}},{"timestamp":"2014= -06-19T14:25:18+02:00","requestId":"fbfd45c9-cf0f-4cb3-= 955c-85c65220a5bd","sessionId":"75cd18db8a364c2",&= quot;event":"TEST Doge Purchase","userId":"do= ge74167705ruffruff","action":"track","context= ":{"library":"analytics-android","libraryVers= ion":"0.6.13"},"properties":{"revenue":&= quot;0,08"}},{"timestamp":"2014-06-19T14:25:18+02:00&qu= ot;,"requestId":"3a643b12-64e5-4a7c-b44b-e3e09dbc5b66",= "sessionId":"75cd18db8a364c2","event":"T= EST Doge Comments","userId":"doge74167705ruffruff"= ,"action":"track","context":{"library&qu= ot;:"analytics-android","libraryVersion":"0.6.13&q= uot;},"properties":{"comment":"Wow..."}},{&qu= ot;action":"identify","context":{"library&quo= t;:"analytics-android","libraryVersion":"0.6.13&qu= ot;},"timestamp":"2014-06-19T14:25:19+02:00","trai= ts":{"email":"doges@m= dmp.com","name":"Carmelo Doge"},"requestI= d":"ef2910f4-cd4f-4175-89d0-7d91b35c229f","sessionId&qu= ot;:"75cd18db8a364c2","userId":"doge74167705ruffru= ff"},{"timestamp":"2014-06-19T14:25:19+02:00",&quo= t;requestId":"1676bb06-abee-4135-a206-d57c4a1bc24d","se= ssionId":"75cd18db8a364c2","event":"TEST Doge= App Usage","userId":"doge74167705ruffruff","= action":"track","context":{"library":&qu= ot;analytics-android","libraryVersion":"0.6.13"},&= quot;properties":{}},{"timestamp":"2014-06-19T14:25:20+= 02:00","requestId":"66532c8a-c5da-4852-b8b6-04df8f3052d= 5","sessionId":"75cd18db8a364c2","event"= :"TEST Doge Comments","userId":"doge74167705ruffru= ff","action":"track","context":{"li= brary":"analytics-android","libraryVersion":"= 0.6.13"},"properties":{"comment":"Many data.&= quot;}},{"timestamp":"2014-06-19T14:25:21+02:00","= requestId":"a1a79d8c-fe58-4567-8dec-a8d1d2ae2713","sess= ionId":"75cd18db8a364c2","event":"TEST Doge P= urchase","userId":"doge74167705ruffruff","act= ion":"track","context":{"library":"= analytics-android","libraryVersion":"0.6.13"},&quo= t;properties":{"revenue":"0,87"}},{"timestamp= ":"2014-06-19T14:25:21+02:00","requestId":"25= 9209ac-b135-4d5f-bdac-535eccc0400e","sessionId":"75cd18= db8a364c2","event":"TEST Doge Comments","user= Id":"doge74167705ruffruff","action":"track&qu= ot;,"context":{"library":"analytics-android",= "libraryVersion":"0.6.13"},"properties":{&quo= t;comment":"Wow..."}},{"timestamp":"2014-06-1= 9T14:25:23+02:00","requestId":"59b6d57c-c7a5-4b2a-af6d-= fa10ae0de60c","sessionId":"75cd18db8a364c2","= event":"TEST Doge Comments","userId":"doge741= 67705ruffruff","action":"track","context"= ;:{"library":"analytics-android","libraryVersion&q= uot;:"0.6.13"},"properties":{"comment":"= Such App!"}},{"timestamp":"2014-06-19T14:25:24+02:00&qu= ot;,"requestId":"8b05226f-bdf5-4af8-bb91-84da1b874c6e",= "sessionId":"75cd18db8a364c2","event":"T= EST Doge Purchase","userId":"doge74167705ruffruff"= ,"action":"track","context":{"library&qu= ot;:"analytics-android","libraryVersion":"0.6.13&q= uot;},"properties":{"revenue":"0,50"}},{"= ;timestamp":"2014-06-19T14:25:24+02:00","requestId"= ;:"0f366675-5641-4238-b2a9-176735de6edd","sessionId":&q= uot;75cd18db8a364c2","event":"TEST Doge Comments",= "userId":"doge74167705ruffruff","action":&quo= t;track","context":{"library":"analytics-andr= oid","libraryVersion":"0.6.13"},"properties&q= uot;:{"comment":"Ruff ruff!"}},{"timestamp":&= quot;2014-06-19T14:25:26+02:00","requestId":"9e832534-5= 114-4ec1-bc20-1dcf1c354d0c","sessionId":"75cd18db8a364c= 2","event":"Session end","userId":"= doge74167705ruffruff","action":"track","conte= xt":{"library":"analytics-android","libraryVe= rsion":"0.6.13"},"properties":{"start":&= quot;14:25:09","end":"14:25:26"}}],"writeKey&= quot;:"a8RCFSAVjmT5qyxLKMzt12kcXWOIusvw","action":"= ;import","received_at":"2014-06-19T12:25:29.790+00:00&q= uot;}


Funny thing is, that I'm sure = that I've seen these values earlier today...I've reloaded the data/= tables several times to see if this is still working...well. :)

I'm gonna stop for today...another try tomorrow.

Thanks so far and many greetings from Berlin,
C= hris










On Mon, Jun 23, 2014 at 6:5= 7 PM, Sachin Goyal <sgoyal@walmartlabs.com> wrote:

You can also use hive-json-schema to automate Hive schema generation from J= SON:
ht= tps://github.com/quux00/hive-json-schema


From: Nitin Pawar <nitinpawar= 432@gmail.com<mailto:niti= npawar432@gmail.com>>
Reply-To: "user@hive.apache.or= g<mailto:user@hive.apache.or= g>" <user@hive.apach= e.org<mailto:user@hive.apach= e.org>>
Date: Monday, June 23, 2014 at 2:25 AM
To: "user@hive.apache.org&= lt;mailto:user@hive.apache.org&= gt;" <user@hive.apache.org<= /a><mailto:user@hive.apache.org<= /a>>>
Subject: Re: how to load json with nested array into hive?
On Mon, Jun 23, 2014 at 2:28 PM, Christian L= ink <christian.link@mdmp.com<= /a><mailto:christian.link@mdm= p.com>> wrote:
Hi Jerome,

thanks...I've already found "Brickhouse" and the Hive UDFs, b= ut it didn't help.

Today I'll try again to process the json file after going through all m= y mails...maybe I'll find a solution.

Best,
Chris


On Fri, Jun 20, 2014 at 7:16 PM, Jerome Bank= s <jbanks@tagged.com<mailto:= jbanks@tagged.com>> wrote: Christian,
=C2=A0 =C2=A0Sorry to spam this newsgroup, and this is not a commercial end= orsement, but check out the Hive UDFs in the Brickhouse project ( http://github.com= /klout/brickhouse ) ( http://brickh= ouseconfessions.wordpress.com/2014/02/07/hive-and-json-made-simple/ )
You can convert arbitrary complex Hive structures to an from json with it&#= 39;s to_json and from_json UDF's. See the blog posting for an explanati= on.

-- jerome


On Fri, Jun 20, 2014 at 8:26 = AM, Christian Link <christian= .link@mdmp.com<mailto:chr= istian.link@mdmp.com>> wrote:
hi,

I'm very, very new to Hadoop, Hive, etc. and I have to import data into= hive tables.

Environment: Amazon EMR, S3, etc.

The input file is on S3 and I copied it into my HDFS.

1. flat table with one column and loaded data into it:

=C2=A0 CREATE TABLE mdmp_raw_data (json_record STRING);
=C2=A0 LOAD DATA INPATH 'hdfs:///input-api/1403181319.json' OVERWRI= TE INTO TABLE `mdmp_raw_data`;
That worked, I can access some data, like this:

SELECT d.carrier, d.language, d.country
=C2=A0 FROM mdmp_raw_data a LATERAL VIEW json_tuple(a.data, 'requestTim= estamp', 'context') b =C2=A0 =C2=A0AS requestTimestamp, context=
=C2=A0 LATERAL VIEW json_tuple(b.context, 'locale') c AS locale
=C2=A0 LATERAL VIEW json_tuple(c.locale, 'carrier', 'language&#= 39;, 'country') d AS carrier, language, country
=C2=A0 LIMIT 1;

Result: o2 - de Deutsch Deutschland

I can also select the array at once:

SELECT b.requestTimestamp, b.batch
=C2=A0 FROM mdmp_raw_data a
=C2=A0 LATERAL VIEW json_tuple(a.data, 'requestTimestamp', 'bat= ch') b AS requestTimestamp, batch
=C2=A0 LIMIT 1;
This will give me:

=C2=A0[{"timestamp":"2014-06-19T14:25:18+02:00","r= equestId":"2ca08247-5542-4cb4-be7e-4a8574fb77a8","sessi= onId":"f29ec175ca6b7d10","event":"TEST Doge C= omments","userId":"doge96514016ruffruff","act= ion":"track","context":{"library":"= analytics-android","libraryVersion":"0.6.13"},&quo= t;properties":{"comment":"Much joy."}}, ...]

This "batch" may contain n events will a structure like above.
I want to put all events in a table where each "element" will be = stored in a unique column: timestamp, requestId, sessionId, event, userId, = action, context, properties

2. explode the "batch" I read a lot about SerDe, etc. - but I don= 't get it.

- I tried to create a table with an array and load the data into it - sever= al errors
use explode in query but it doesn't accept "batch" as array - integrated several SerDes but get things like "unknown function jspi= lt"
- I'm lost in too many documents, howtos, etc. and could need some advi= ces...

Thank you in advance!

Best, Chris





--
Nitin Pawar

--047d7b472a1009512b04fc841a55--