hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Sela <am...@infolinks.com>
Subject Re: HBase as a transformation engine
Date Wed, 13 Nov 2013 09:34:20 GMT

We do something like that programmatically.
Read blobbed HBase data (qualifiers represent cross-sections such as
country_product and blob data such as clicks, impressions etc.)
We have several aggregation tasks (one per MySQL table) that aggregates the
data and inserts (in batches) to MySQL.
I don't know how much data you wanna scan and insert but we scan, aggregate
and insert approximately 7GB as ~12M lines from one HBase table into 9
MySQL tables and that takes a little bit less than 2 hours.
Our analysis shows that ~25% of that time is net HBase read and most of the
time is spent on MySQL inserts.
Since we are in the process of building a new system, optimizing is not in
our agenda but I would definitely try writing to csv and bulk loading into

Hope that helps.

On Wed, Nov 13, 2013 at 9:11 AM, Vincent Barat <vincent.barat@gmail.com>wrote:

> Hi,
> We have done this kind of thing using HBase 0.92.1 + Pig, but we finally
> had to limit the size of the tables and move the biggest data to HDFS:
> loading data directly from HBase is much slower than from HDFS, and doing
> it using M/R overloads HBase region servers, since several maps jobs scan
> table regions at the same time: so the bigger your tables are, the higher
> the load is (usually Pig creates 1 map per region, I don't know about Hive).
> This may not be an issue if your HBase cluster is dedicated to this kind
> of job, but if you also have to ensure a good random read latency at the
> same time, forget it.
> Regards,
> Le 11/11/2013 13:10, JC a écrit :
>  We are looking to use hbase as a transformation engine. In other words,
>> take
>> data already loaded into hbase, run some large calculation/aggregation on
>> that data and then load it back into a rdbms for our BI analytic tools to
>> use. I was curious about what the communities experience is on this and if
>> there are some best practices. Some thoughts we are kicking around is
>> using
>> Mapreduce 2 and Yarn and writing files to HDFS to be loaded into the
>> rdbms.
>> Not sure what all the pieces are needed for the complete application
>> though.
>> Thanks in advance for your help,
>> JC
>> --
>> View this message in context: http://apache-hbase.679495.n3.
>> nabble.com/HBase-as-a-transformation-engine-tp4052670.html
>> Sent from the HBase User mailing list archive at Nabble.com.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message