hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hsieh <...@cloudera.com>
Subject Re: Faster Bulkload from Oracle to HBase
Date Tue, 31 Jan 2012 06:21:05 GMT
Hi Laxman,

I'm an Apache HBase and Sqoop committer.  I haven't run the comparison
you've suggested but my first thoughts are to consider #1 and #3.

Case 1 will natively export data which should be the fastest way to get
data out of Oracle.  You many need to do some reformatting to use HBase's
importtsv.

Case 2, writing your own DBInputFormat, is essentially duplicating what
Sqoop does in the generic case.  Here it is essentially executing sql
queries against the rdbms to get data out which will likely be a bit slower
than a database's native bulk export feature.

Case 3, consider using Apache Sqoop in conjunction with Quest's data
connector for oracle and hadoop.  It is a free as in beer plugin to sqoop!
 This is probably the fastest from a getting started point of view (no dev
time) but may not be as performant as #1.  I'd give it a try.
http://www.quest.com/data-connector-for-oracle-and-hadoop/

If you have more questions about Sqoop, feel free to cross post to
sqoop-user@incubator.apache.org!

Jon.

On Mon, Jan 30, 2012 at 5:48 AM, Laxman <lakshman.ch@huawei.com> wrote:

> Hi,
>
> We have the following use-case.
>
> We have data in relational database (Oracle).
> We need to export this data to HBase and perform analysis on this data.
> We need to perform this export-import 500G periodically, say every month.
>
> Following are the different approaches I can see as per my knowledge.
> Before testing and finding out the best way by myself, I wanted to listen
> from the experts here.
>
> Approach #1
> ===========
> 1) Export from Oracle to raw text file (Using Oracle export utility -
> Faster
> - Involves no transactional overhead)
>
> 2) Upload text file to HDFS
>
> 3) Run the bulk load job (HFileOutputFormat.configureIncrementalLoad())
>
> Approach #2
> ===========
> 1) Write a custom Job using DBInputFormat to directly read from database.
>        - Just a thought to avoid multiple hops(Oracle to Local FS, Local FS
> to HDFS, HDFS to HBase) involved in approach #1.
>
> 2) Use the HBase bulk load tool to load this data to
> HBase.(HFileOutputFormat.configureIncrementalLoad())
>
> Approach #3
> ===========
> 1) Use Apache Sqoop (Currently under incubation) to achieve my requirement.
>        - I'm not aware of the istability of this.
>
> Also, please suggest me if we have a better approach than the above.
>
>
>
>


-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message