hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vincent Barat <vincent.ba...@gmail.com>
Subject Re: HBase as a transformation engine
Date Wed, 13 Nov 2013 07:11:40 GMT

We have done this kind of thing using HBase 0.92.1 + Pig, but we 
finally had to limit the size of the tables and move the biggest 
data to HDFS: loading data directly from HBase is much slower than 
from HDFS, and doing it using M/R overloads HBase region servers, 
since several maps jobs scan table regions at the same time: so the 
bigger your tables are, the higher the load is (usually Pig creates 
1 map per region, I don't know about Hive).

This may not be an issue if your HBase cluster is dedicated to this 
kind of job, but if you also have to ensure a good random read 
latency at the same time, forget it.


Le 11/11/2013 13:10, JC a écrit :
> We are looking to use hbase as a transformation engine. In other words, take
> data already loaded into hbase, run some large calculation/aggregation on
> that data and then load it back into a rdbms for our BI analytic tools to
> use. I was curious about what the communities experience is on this and if
> there are some best practices. Some thoughts we are kicking around is using
> Mapreduce 2 and Yarn and writing files to HDFS to be loaded into the rdbms.
> Not sure what all the pieces are needed for the complete application though.
> Thanks in advance for your help,
> JC
> --
> View this message in context: http://apache-hbase.679495.n3.nabble.com/HBase-as-a-transformation-engine-tp4052670.html
> Sent from the HBase User mailing list archive at Nabble.com.

View raw message