Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0BA1ED761 for ; Tue, 18 Sep 2012 21:15:02 +0000 (UTC) Received: (qmail 36785 invoked by uid 500); 18 Sep 2012 21:15:00 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 36713 invoked by uid 500); 18 Sep 2012 21:15:00 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 36704 invoked by uid 99); 18 Sep 2012 21:15:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Sep 2012 21:15:00 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mark.golden1744@gmail.com designates 209.85.213.48 as permitted sender) Received: from [209.85.213.48] (HELO mail-yw0-f48.google.com) (209.85.213.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Sep 2012 21:14:55 +0000 Received: by yhfq46 with SMTP id q46so98759yhf.35 for ; Tue, 18 Sep 2012 14:14:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=x6QsjscGkWt/z8tn8WIn7MhE/7jGsoqikg8F/OfZDLc=; b=zZgKxgcX6+7Z3S/9DogLGh9/QXvZUEk9oQaU3wUqp84HHftaUCcNrellllWK0mvcgP Ov4uQYO64UAavHgkHdUMdaoJIIN1Ij5KgxxhB8DFsRva1WRqd/DTDzOaqMePp+AlpmlE u990m9YehNNeGXF/9ZzZFeS6qc1g0Y8yEY3cj5ujdLjGjucmlzSxC7WiqpzAIWqjzcxK vfhKqHM6JZAQuTeOJwd84WGOzCIFLFEZp4SXqWFS2/RzIo+yq8W9WZ8tQPdrS+683NiV WlAGiQlq6hqWaF/5pP/CtBjAHVRfIBc0hNchDtL4FGIiCRadpfx8Y/IUnIF/m8tyA51n Cc6Q== MIME-Version: 1.0 Received: by 10.236.161.135 with SMTP id w7mr1763891yhk.15.1348002875211; Tue, 18 Sep 2012 14:14:35 -0700 (PDT) Received: by 10.147.19.21 with HTTP; Tue, 18 Sep 2012 14:14:35 -0700 (PDT) In-Reply-To: <9D8A350A3269554E91B45801B5E8CDAC67FB69@SOM-EXCH02.nuance.com> References: <9D8A350A3269554E91B45801B5E8CDAC67F879@SOM-EXCH02.nuance.com> <9D8A350A3269554E91B45801B5E8CDAC67F88E@SOM-EXCH02.nuance.com> <9D8A350A3269554E91B45801B5E8CDAC67F973@SOM-EXCH02.nuance.com> <9D8A350A3269554E91B45801B5E8CDAC67F9DB@SOM-EXCH02.nuance.com> <9D8A350A3269554E91B45801B5E8CDAC67FB69@SOM-EXCH02.nuance.com> Date: Tue, 18 Sep 2012 14:14:35 -0700 Message-ID: Subject: Re: hive json serde From: Mark Golden To: user@hive.apache.org Content-Type: multipart/alternative; boundary=20cf30549fe58881b304ca00605e X-Virus-Checked: Checked by ClamAV on apache.org --20cf30549fe58881b304ca00605e Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I agree that we can use $.C.D & C.D directly but if the key is not present and I have several such keys then it is better to not create so many columns. On Tue, Sep 18, 2012 at 5:47 AM, Connell, Chuck w= rote: > OK, but keep in mind that you could use > get_json_object(whole-json-field, $.C.D). If using a serde, then you can > say just C.D directly.**** > > ** ** > > Chuck Connell**** > > Nuance R&D Data Team**** > > Burlington, MA**** > > ** ** > > ** ** > > *From:* Mark Golden [mailto:mark.golden1744@gmail.com] > *Sent:* Monday, September 17, 2012 7:31 PM > > *To:* Connell, Chuck > *Cc:* user@hive.apache.org > *Subject:* Re: hive json serde**** > > ** ** > > Thanks Chuck. > > I tried both json_tuple & get_json_objects on top of the serde table and > it works too. I have nested jsons and some of the keys do not have a > specific schema (You can throw whatever you want there) so if I want to > query from those keys then I have to use get_json_object on top of the > serde table. If I have the following json > > { "A" : "I love hive", "B" : "User mailing list is great", "C" : { "D" : > "chuck", "E" : "is awesome and thanks for your help" } } > { "A" : "I love hive", "B" : "User mailing list is great", "C" : { "G" : > "It will rain in NY tonight" } } > { "A" : "I love hive", "B" : "User mailing list is great", "C" : { "H" : > "No it will snow in NY" } } > > If I create a table on top of json like this then the serde will work, > json_tuple & get_json_object will also work but to query the keys inside > "C" I need to do get_json_object on "C" something like > > select * > from my_serde_table > where get_json_object(C, "$.D") ilike "chuck" > > I like serde table because it is much cleaner than create a table with > (value string) and then doing get_json_object or json_tuple and extract a= ll > the columns out. > > I'm exposing this table to other users in my company so I want to have th= e > serde table. Thanks for your help.**** > > On Mon, Sep 17, 2012 at 3:49 PM, Connell, Chuck > wrote:**** > > A few things that should help... > > - There are three ways to handle JSON within a Hive table, > get_json_object, json_tuple, and JSON SerDe. These are separate mechanism= s. > So if you want to use get_json_object, you don't need to load a SerDe. > > - With a SerDe, Hive understands JSON fields natively, so you can just us= e > the plain field names, no special function call at all. > > - I'm not sure if auxpath accepts a relative path, but I know the full > path works. > > Chuck > > > **** > ------------------------------ > > *From:* Mark Golden [mark.golden1744@gmail.com] > *Sent:* Monday, September 17, 2012 5:58 PM > *To:* Connell, Chuck > *Cc:* user@hive.apache.org**** > > > *Subject:* Re: hive json serde**** > > ** ** > > It works now. Looks like there is a bug in the code. > > if you do hive --auxpath ./serde then I get an error but if I get the ful= l > path as > hive --auxpath /var/lib/hdfs/serde/ then get_json_object() works. > > Thanks for your help. Appreciate it. > > CCing hive user group as well. > > **** > > On Mon, Sep 17, 2012 at 2:54 PM, Mark Golden > wrote:**** > > :-) Thanks. I downloaded and got it compiled as well. > > My table has nested JSON values so when I tried to do > get_json_object(,'$.') this serde doesn't work in that case. > It throws an error. I'll email the developer too. > > When you tried did you try this on a cluster of nodes or on a single > instance? ** ** > > ** ** > > On Mon, Sep 17, 2012 at 2:38 PM, Connell, Chuck > wrote:**** > > Go to the Downloads link on the project page, near the upper right. > > **** > ------------------------------ > > *From:* Mark Golden [mark.golden1744@gmail.com] > *Sent:* Monday, September 17, 2012 5:32 PM > *To:* Connell, Chuck > *Subject:* Re: hive json serde**** > > Hey Chuck, > > Where did you download the prebuild jar from?**** > > On Mon, Sep 17, 2012 at 12:10 PM, Connell, Chuck > wrote:**** > > I used his pre-built jar. No need to compile anything.**** > > **** > > Be sure to add =93 --auxpath /path/to/jar/dir =93 to the Hive command lin= e.*** > * > > **** > > Chuck**** > > **** > > **** > > **** > > *From:* Connell, Chuck [mailto:Chuck.Connell@nuance.com] > *Sent:* Monday, September 17, 2012 3:06 PM > *To:* user@hive.apache.org > *Subject:* RE: hive json serde**** > > **** > > I just finished testing this one. No problems found. The developer is als= o > quite responsive to issues raised. I encouraged him to submit it to the > Hive dev team as core code.**** > > **** > > https://github.com/rcongiu/Hive-JSON-Serde/**** > > **** > > Chuck Connell**** > > Nuance R&D Data Team**** > > Burlington, MA**** > > **** > > **** > > *From:* Mark Golden [mailto:mark.golden1744@gmail.com] > > *Sent:* Monday, September 17, 2012 2:56 PM > *To:* user@hive.apache.org > *Subject:* hive json serde**** > > **** > > Hi, > > I'm looking for a hive json serde that works well. Any suggestions? I did > a quick google and found a few but none of them are complete. > > Thanks, > Mark.**** > > ** ** > > ** ** > > ** ** > > ** ** > --20cf30549fe58881b304ca00605e Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I agree that we can use $.C.D & C.D directly but if the key is not pres= ent and I have several such keys then it is better to not create so many co= lumns.

On Tue, Sep 18, 2012 at 5:47 AM, C= onnell, Chuck <Chuck.Connell@nuance.com> wrote:

OK, but keep in mind that= you could use get_json_object(whole-json-field, $.C.D). If using a serde, = then you can say just C.D directly.

=A0<= /p>

Chuck Connell

Nuance R&D Data Team

Burlington, MA

=A0

=A0<= /p>

From: Mark Gol= den [mailto:= mark.golden1744@gmail.com]
Sent: Monday, September 17, 2012 7:31 PM


To: Connell, Chuck
Cc: user@h= ive.apache.org
Subject: Re: hive json serde

=A0

Thanks Chuck.

I tried both json_tuple & get_json_objects on top of the serde table an= d it works too. I have nested jsons and some of the keys do not have a spec= ific schema (You can throw whatever you want there) so if I want to query f= rom those keys then I have to use get_json_object on top of the serde table. If I have the following json

{ "A" : "I love hive", "B" : "User maili= ng list is great", "C" : { "D" : "chuck"= , "E" : "is awesome and thanks for your help" } }
{ "A" : "I love hive", "B" : "User maili= ng list is great", "C" : { "G" : "It will rai= n in NY tonight" } }
{ "A" : "I love hive", "B" : "User maili= ng list is great", "C" : { "H" : "No it will = snow in NY" } }

If I create a table on top of json like this then the serde will work, json= _tuple & get_json_object will also work but to query the keys inside &q= uot;C" I need to do get_json_object on "C" something like
select *
from my_serde_table
where get_json_object(C, "$.D") ilike "chuck"

I like serde table because it is much cleaner than create a table with (val= ue string) and then doing get_json_object or json_tuple and extract all the= columns out.

I'm exposing this table to other users in my company so I want to have = the serde table. Thanks for your help.

On Mon, Sep 17, 2012 at 3:49 PM, Connell, Chuck <= Chuck.Connell= @nuance.com> wrote:

A few thi= ngs that should help...

- There are three ways to handle JSON within a Hive table, get_json_object,= json_tuple, and JSON SerDe. These are separate mechanisms. So if you want = to use get_json_object, you don't need to load a SerDe.

- With a SerDe, Hive understands JSON fields natively, so you can just use = the plain field names, no special function call at all.

- I'm not sure if auxpath accepts a relative path, but I know the full = path works.

Chuck



From: Mark Golden [mark.golden1744@gmail.com]
Sent: Monday, September 17, 2012 5:58 PM
To: Connell, Chuck
Cc: user@h= ive.apache.org


Subject: Re: hive json serde

=A0

It works now. Looks l= ike there is a bug in the code.

if you do hive --auxpath ./serde then I get an error but if I get the full = path as
hive --auxpath /var/lib/hdfs/serde/ then get_json_object() works.

Thanks for your help. Appreciate it.

CCing hive user group as well.

On Mon, Sep 17, 2012 at 2:54 PM, Mark Golden <mark.golden1744= @gmail.com> wrote:

:-) Thanks. I downloaded and got it compiled as well= .

My table has nested JSON values so when I tried to do get_json_object(<c= olumn>,'$.<key>') this serde doesn't work in that case= . It throws an error. I'll email the developer too.

When you tried did you try this on a cluster of nodes or on a single instan= ce?

=A0

On Mon, Sep 17, 2012 at 2:38 PM, Connell, Chuck <= Chuck.Connell= @nuance.com> wrote:

Go to the= Downloads link on the project page, near the upper right.


From: Mark Golden= [mark.golde= n1744@gmail.com]
Sent: Monday, September 17, 2012 5:32 PM
To: Connell, Chuck
Subject: Re: hive json serde

Hey Chuck,

Where did you download the prebuild jar from?

On Mon, Sep 17, 2012 at 12:10 PM, Connell, Chuck <= ;Chuck.Connel= l@nuance.com> wrote:

I used his pre-built jar.= No need to compile anything.

=A0<= /p>

Be sure to add =93 --auxp= ath /path/to/jar/dir =93 to the Hive command line.

=A0<= /p>

Chuck

=A0<= /p>

=A0<= /p>

=A0<= /p>

From: Connell,= Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Monday, September 17, 2012 3:06 PM
To: user@h= ive.apache.org
Subject: RE: hive json serde

=A0

I just finished testing t= his one. No problems found. The developer is also quite responsive to issue= s raised. =A0I encouraged him to submit it to the Hive dev team as core code= .

=A0<= /p>

https://github.com/rcongiu= /Hive-JSON-Serde/

=A0<= /p>

Chuck Connell=

Nuance R&D Data Team

Burlington, MA

=A0

=A0<= /p>

From: Mark Gol= den [mailto:= mark.golden1744@gmail.com]
Sent: Monday, September 17, 2012 2:56 PM
To: user@h= ive.apache.org
Subject: hive json serde

=A0

Hi,

I'm looking for a hive json serde that works well. Any suggestions? I d= id a quick google and found a few but none of them are complete.

Thanks,
Mark.

=A0

=A0

=A0

=A0


--20cf30549fe58881b304ca00605e--