hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steinmaurer Thomas" <Thomas.Steinmau...@scch.at>
Subject Incremental pre-aggregation strategy with MapReduce
Date Fri, 02 Sep 2011 06:06:41 GMT
Hello,

 

we are storing detailed measurement values in a Hadoop/Hbase cluster.
For end-user / analysis tasks, we need to provide aggregated values
along a date dimension (aggregate by day, month, quarter, year). The
aggregates shall be stored in an Oracle database for easier data
mangling via different client types (OLAP clients ...)

 

A brute-force approach for generating the aggregates is to run a
MapReduce job in the night which process the entire Hbase table and does
the aggregation.

 

I wonder, are there any best practices on how to possibly do the
pre-aggregation thing via a MapReduce job in an incremental way? For
example, how to detect changes in HBase since the last MR-Job run etc
...

 

Thanks!

 

Regards,

Thomas

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message