hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Strategies for aggregating data in a HBase table
Date Wed, 21 Dec 2011 08:13:29 GMT
Thomas,

Sorry for shameless self-promotion. Can you look at our hbase-lattice
project? it is incremental OLAP-ish cube compilation with custom
filtering to optimize for composite key scans. Some rudimental query
language as well.

Bunch of standard (and not so standard) aggregates for measure data
and ability to relatively easily add user aggregate thru model
definiton.

Very early stage. But see if it could fit your purpose, maybe even
share some perspectives since i am honestly not an expert on
dimensional data representation.

(I guess i need to add some query shell so people can try it out more easily.. )

On Mon, Nov 28, 2011 at 1:55 AM, Steinmaurer Thomas
<Thomas.Steinmaurer@scch.at> wrote:
> Hello,
>
>
>
> this has been already discussed a bit in the past, but I'm trying to
> refresh this thread as this is an important design issue in our HBase
> evaluation.
>
>
>
> Basically, the result of our evaluation was that we gonna be happy with
> what Hadoop/HBase offers for managing our measurement/sensor data.
> Although one crucial thing for e.g. backend analysis tasks is, we need
> access to aggregated data very quickly. The idea is to run a MapReduce
> job and store the dialy aggregates in a RDBMS, which allows us to access
> aggregated data more easily via different tools (BI frontends etc.).
> Monthly and yearly aggregates are then handled with RDBMS concepts like
> Materialized Views and Partitioning.
>
>
>
> While it is an option processing the entire HBase table e.g. every night
> when we go live, it probably isn't an option when data volume grows over
> the years. So, what options are there for some kind of incremental
> aggregating only new data?
>
>
>
> - Perhaps using versioning (internal timestamp) might be an option?
>
> - Perhaps having some kind of HBase (daily) staging table which is
> truncated after aggregating data is an option?
>
> - How could Co-processors help here (at the time of the Go-Live, they
> might be available in e.g. Cloudera)?
>
>
>
> etc.
>
>
>
> Any ideas/comments are appreciated.
>
>
>
> Thanks,
>
> Thomas
>
>
>

Mime
View raw message