hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Vasilenko <>
Subject Re: HCatalog access from a Java app
Date Fri, 13 Jun 2014 13:51:17 GMT
I am not sure about java docs... ;-]
I have spent the last three years integrating with HCat and to make it work
had to go thru the code...

So here are some samples that can be helpful to start with. If you are
using Hive 0.12.0 I would not bother with the new APIs... I had to create
some shim classes for HCat to make my code version independent but I cannot
share that.


1. To enumerate tables ... just use Hive client ... this seems to be
version independent

   hiveMetastoreClient = new HiveMetaStoreClient(conf);

// the conf should contain the "hive.metastore.uris" property that point to
your Hive Metastore thrift server
   List<String> databases = hiveMetastoreClient.getAllDatabases();
// this will get you all the databases
   List<String> tables = hiveMetastoreClient.getAllTables(database);
// this will get you all the tables for the give data base

2. To get the table schema... I assume that you are after HCat schema

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hcatalog.mapreduce.HCatInputFormat;
import org.apache.hcatalog.mapreduce.HCatSplit;
import org.apache.hcatalog.mapreduce.InputJobInfo;

  Job job = new Job(config);
  job.setJarByClass(XXXXXX.class); // this will be your class
  InputJobInfo inputJobInfo = InputJobInfo.create("my_data_base",
"my_table", "partition filter");
HCatInputFormat.setInput(job, inputJobInfo);
HCatSchema s =  HCatInputFormat.getTableSchema(job);

3. To read the HCat records....

It depends on how you' like to read the records  ... will you be reading
ALL the records remotely from the client app
or you will get input splits and read the records on mappers....???

The code will be different (somewhat)... let me know...

On Fri, Jun 13, 2014 at 8:25 AM, Brian Jeltema <> wrote:

> Version 0.12.0.
> I’d like to obtain the table’s schema, scan a table partition, and use the
> schema to parse the rows.
> I can probably figure this out by looking at the HCatalog source. My
> concern was that
> the HCatalog packages in the Hive distributions are excluded in the
> JavaDoc, which implies
> that the API is not public. Is there a reason for this?
> Brian
> On Jun 13, 2014, at 9:10 AM, Dmitry Vasilenko <> wrote:
> You should be able to access this information. The exact API depends on
> the version of Hive/HCat. As you know earlier HCat API is being deprecated
> and will be removed in Hive 0.14.0. I can provide you with the code sample
> if you tell me what you are trying to do and what version of Hive you are
> using.
> On Fri, Jun 13, 2014 at 7:33 AM, Brian Jeltema <
>> wrote:
>> I’m experimenting with HCatalog, and would like to be able to access
>> tables and their schema
>> from a Java application (not Hive/Pig/MapReduce). However, the API seems
>> to be hidden, which
>> leads leads me to believe that this is not a supported use case. Is
>> HCatalog use limited to
>> one of the supported frameworks?
>> TIA
>> Brian

View raw message