Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 290DB1174F for ; Fri, 13 Jun 2014 13:51:45 +0000 (UTC) Received: (qmail 3269 invoked by uid 500); 13 Jun 2014 13:51:43 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 3203 invoked by uid 500); 13 Jun 2014 13:51:43 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 3195 invoked by uid 99); 13 Jun 2014 13:51:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Jun 2014 13:51:43 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dvasilen@gmail.com designates 209.85.219.52 as permitted sender) Received: from [209.85.219.52] (HELO mail-oa0-f52.google.com) (209.85.219.52) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Jun 2014 13:51:38 +0000 Received: by mail-oa0-f52.google.com with SMTP id j17so2853820oag.25 for ; Fri, 13 Jun 2014 06:51:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=7xEDVT+rHC5eGTHMi77MPe4hz2JrX0yNEKIzdOZPEcs=; b=McgY+/ErBIvAwQ2ijIObduGWVLbElSF52NDScbKSaEQSITOD2gVIg4HCK0BgIEppfO a5yrtP5ni5zzWs+BfxesXvy7divd7HPDq3ODk+ISt/6SdbTT9IpnnXyS132O5hz6GYGl MDrHtwWG1gcuWe4XPt7nQ6cfRhACUxFMUWefh3I77+S8kWN34wAZKVFLM927TOO2Yp0p kvsedNEpW9v5EGutlacoTtcnMHu2R2vpxu8k0DjAfIlJewaJthlc4dJ3LF5WiyZIZtiS oQ50X760m/aTteQ8Z3ovYOSxZ/Dn97HeU3kzOehTXKm2CxuK1lu/f1nnvbmX/Fh7CO7n PNCQ== MIME-Version: 1.0 X-Received: by 10.182.233.229 with SMTP id tz5mr2954344obc.27.1402667477771; Fri, 13 Jun 2014 06:51:17 -0700 (PDT) Received: by 10.76.87.202 with HTTP; Fri, 13 Jun 2014 06:51:17 -0700 (PDT) In-Reply-To: <04E228BB-233A-4405-8B9B-A0F211AC2B74@digitalenvoy.net> References: <9E8911FE-2422-4295-91DA-CE67FDC164FC@digitalenvoy.net> <04E228BB-233A-4405-8B9B-A0F211AC2B74@digitalenvoy.net> Date: Fri, 13 Jun 2014 08:51:17 -0500 Message-ID: Subject: Re: HCatalog access from a Java app From: Dmitry Vasilenko To: user@hive.apache.org Content-Type: multipart/alternative; boundary=001a11c1c196c0298304fbb7f817 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c1c196c0298304fbb7f817 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I am not sure about java docs... ;-] I have spent the last three years integrating with HCat and to make it work had to go thru the code... So here are some samples that can be helpful to start with. If you are using Hive 0.12.0 I would not bother with the new APIs... I had to create some shim classes for HCat to make my code version independent but I cannot share that. So 1. To enumerate tables ... just use Hive client ... this seems to be version independent hiveMetastoreClient =3D new HiveMetaStoreClient(conf); // the conf should contain the "hive.metastore.uris" property that point to your Hive Metastore thrift server List databases =3D hiveMetastoreClient.getAllDatabases(); // this will get you all the databases List tables =3D hiveMetastoreClient.getAllTables(database); // this will get you all the tables for the give data base 2. To get the table schema... I assume that you are after HCat schema import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.mapreduce.InputSplit; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hcatalog.data.schema.HCatSchemaUtils; import org.apache.hcatalog.mapreduce.HCatInputFormat; import org.apache.hcatalog.mapreduce.HCatSplit; import org.apache.hcatalog.mapreduce.InputJobInfo; Job job =3D new Job(config); job.setJarByClass(XXXXXX.class); // this will be your class job.setInputFormatClass(HCatInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); InputJobInfo inputJobInfo =3D InputJobInfo.create("my_data_base", "my_table", "partition filter"); HCatInputFormat.setInput(job, inputJobInfo); HCatSchema s =3D HCatInputFormat.getTableSchema(job); 3. To read the HCat records.... It depends on how you' like to read the records ... will you be reading ALL the records remotely from the client app or you will get input splits and read the records on mappers....??? The code will be different (somewhat)... let me know... On Fri, Jun 13, 2014 at 8:25 AM, Brian Jeltema < brian.jeltema@digitalenvoy.net> wrote: > Version 0.12.0. > > I=E2=80=99d like to obtain the table=E2=80=99s schema, scan a table parti= tion, and use the > schema to parse the rows. > > I can probably figure this out by looking at the HCatalog source. My > concern was that > the HCatalog packages in the Hive distributions are excluded in the > JavaDoc, which implies > that the API is not public. Is there a reason for this? > > Brian > > On Jun 13, 2014, at 9:10 AM, Dmitry Vasilenko wrote: > > You should be able to access this information. The exact API depends on > the version of Hive/HCat. As you know earlier HCat API is being deprecate= d > and will be removed in Hive 0.14.0. I can provide you with the code sampl= e > if you tell me what you are trying to do and what version of Hive you are > using. > > > On Fri, Jun 13, 2014 at 7:33 AM, Brian Jeltema < > brian.jeltema@digitalenvoy.net> wrote: > >> I=E2=80=99m experimenting with HCatalog, and would like to be able to ac= cess >> tables and their schema >> from a Java application (not Hive/Pig/MapReduce). However, the API seems >> to be hidden, which >> leads leads me to believe that this is not a supported use case. Is >> HCatalog use limited to >> one of the supported frameworks? >> >> TIA >> >> Brian > > > > --001a11c1c196c0298304fbb7f817 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I am not sure about java docs... ;-]
I = have spent the last three years integrating with HCat and to make it work h= ad to go thru the code...

So here are some samples= that can be helpful to start with. If you are using Hive 0.12.0 I would no= t bother with the new APIs... I had to create some shim classes for HCat to= make my code version independent but I cannot share that.=C2=A0

So=C2=A0

1. To enumerate table= s ... just use Hive client ... this seems to be version independent=C2=A0

=C2=A0 =C2=A0hiveMetastoreClient =3D new = HiveMetaStoreClient(conf);

// the conf should contain the "hive.metastore.uri= s" property that point to your Hive Metastore thrift server
=C2=A0 = =C2=A0List<String> databases =3D hiveMetastoreClient.getAll= Databases();
// this will get you all the databases
=C2=A0 =C2=A0Li= st<String> tables =3D hiveMetastoreClient.getAllTables(database);
=
// this will get you all the tables for the give data base
=

2. To get the table schema... I assume that you are after HCat s= chema =C2=A0


import org.apache.hado= op.conf.Configuration;
import org.apache.hadoop.mapreduce.InputSplit; import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapredu= ce.lib.output.TextOutputFormat;
import org.apache.hcatalog.data.schema.H= CatSchemaUtils;
import org.apache.hcatalog.mapreduce.HCatInputFormat; import org.apache.hcatalog.mapreduce.HCatSplit;
import org.apache.hcatal= og.mapreduce.InputJobInfo;


=C2= =A0 Job job =3D new Job(config);
=C2=A0 job.setJarByClass(XX= XXXX.class); // this will be your class=C2=A0
job.setInputFormatClass(HCatInputFormat.class);
job.setO= utputFormatClass(TextOutputFormat.class);
=C2=A0 InputJobInfo inpu= tJobInfo =3D InputJobInfo.create("my_data_base", "my_table&q= uot;, "partition filter");
HCatInputFormat.setInput(job, inputJobInfo);
HCatSchema s =3D=C2= =A0 HCatInputFormat.getTableSchema(job);


3. To read the HCat records....

It= depends on how you' like to read the records =C2=A0... will you be rea= ding ALL the records remotely from the client app =C2=A0
or you will get input splits and read the records on mappers....???

The code will be different (somewhat)... let me know= ...





=C2=A0












=C2=A0
<= /div>


On Fri, Jun 13, 2014 at 8:25 AM, Brian J= eltema <brian.jeltema@digitalenvoy.net> wrote:<= br>
Version 0.12.0.

I=E2= =80=99d like to obtain the table=E2=80=99s schema, scan a table partition, = and use the schema to parse the rows.

I can probab= ly figure this out by looking at the HCatalog source. My concern was that
the HCatalog packages in the Hive distributions are excluded in the Ja= vaDoc, which implies
that the API is not public. Is there a reaso= n for this?

Brian

On Jun 13, 2014, at 9:10 AM, Dmitry Vasilenko <dvasilen@gmail.com> wrote:
You should be able to access= this information. The exact API depends on the version of Hive/HCat. As yo= u know earlier HCat API is being deprecated and will be removed in Hive 0.1= 4.0. I can provide you with the code sample if you tell me what you are try= ing to do and what version of Hive you are using.


On Fri,= Jun 13, 2014 at 7:33 AM, Brian Jeltema <brian.jeltema@digita= lenvoy.net> wrote:
I=E2=80=99m experimenting with HCatalog, and= would like to be able to access tables and their schema
from a Java application (not Hive/Pig/MapReduce). However, the API seems to= be hidden, which
leads leads me to believe that this is not a supported use case. Is HCatalo= g use limited to
one of the supported frameworks?

TIA

Brian



--001a11c1c196c0298304fbb7f817--