hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From amit jaiswal <amit_...@yahoo.com>
Subject Re: How to mount/proxy a db table in hive
Date Mon, 02 Aug 2010 06:46:08 GMT
The original data is stored in database, and there is no need to create a 
separate copy of the database in HDFS for every job. Extending the notion of 
database, the data can be stored in any storage. One way of abstracting out 
things would to be  implement a InputFormat that knows how to read the data, and 
provide the correct InputSplit and RecordReader implementation. The custom input 
format that I had mentioned works fine in a pure hadoop job.

Is it possible to leverage the input format support in hive table creation to 
make such queries. Just 'select * from <table>' API support will also be 
sufficient as the actual sql query can be part of the InputFormat 
implementation.

-amit




________________________________
From: Sonal Goyal <sonalgoyal4@gmail.com>
To: hive-user@hadoop.apache.org
Sent: Mon, 2 August, 2010 12:03:32 PM
Subject: Re: How to mount/proxy a db table in hive

Hi Amit,

Hive needs data to be stored in its own namespace. Can you please explain why 
you want to call the database through Hive ?
 
Thanks and Regards,
Sonal
www.meghsoft.com
http://in.linkedin.com/in/sonalgoyal



On Mon, Aug 2, 2010 at 11:56 AM, amit jaiswal <amit_jus@yahoo.com> wrote:

Hi,
>
>I have a database and am looking for a way to 'mount' the db table in hive in
>such a way that the select query in hive gets translated to sql query for
>database. I saw DBInputFormat and sqoop, but nothing that can create a proxy
>table in hive which internally makes db calls.
>
>I also tried to use custom variant of DBInputFormat as the input format for the
>database table.
>
>create table employee (id int, name string) stored as INPUTFORMAT
>'mycustominputformat' OUTPUTFORMAT
>'org.apache.hadoop.mapred.SequenceFileOutputFormat';
>
>select id from employee;
>This fails while running hadoop job because HiveInputFormat only supports
>FileSplits.
>
>HiveInputFormat:
>   public long getStart() {
>     if (inputSplit instanceof FileSplit) {
>       return ((FileSplit)inputSplit).getStart();
>     }
>     return 0;
>   }
>
>Any suggestions as if there are any InputFormat implementation that can be
>used?
>
>-amit
>

Mime
View raw message