hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steinmaurer Thomas" <Thomas.Steinmau...@scch.at>
Subject Incremental pre-aggregation strategy with MapReduce
Date Fri, 02 Sep 2011 06:06:41 GMT


we are storing detailed measurement values in a Hadoop/Hbase cluster.
For end-user / analysis tasks, we need to provide aggregated values
along a date dimension (aggregate by day, month, quarter, year). The
aggregates shall be stored in an Oracle database for easier data
mangling via different client types (OLAP clients ...)


A brute-force approach for generating the aggregates is to run a
MapReduce job in the night which process the entire Hbase table and does
the aggregation.


I wonder, are there any best practices on how to possibly do the
pre-aggregation thing via a MapReduce job in an incremental way? For
example, how to detect changes in HBase since the last MR-Job run etc







  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message