hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Strategies for aggregating data in a HBase table
Date Wed, 21 Dec 2011 08:16:28 GMT
Also re: frontend is always a problem. so far we have a custom data
source for this thing in jasper reports, but jdbc eventually is also
possible. Looking to see what it takes to mount jpivot to it, but it
is more serious endeavor so no big expectations there (unless i pick
somebody willing to help there).

On Wed, Dec 21, 2011 at 12:14 AM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> https://github.com/dlyubimov/HBase-Lattice
>
> On Wed, Dec 21, 2011 at 12:13 AM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>> Thomas,
>>
>> Sorry for shameless self-promotion. Can you look at our hbase-lattice
>> project? it is incremental OLAP-ish cube compilation with custom
>> filtering to optimize for composite key scans. Some rudimental query
>> language as well.
>>
>> Bunch of standard (and not so standard) aggregates for measure data
>> and ability to relatively easily add user aggregate thru model
>> definiton.
>>
>> Very early stage. But see if it could fit your purpose, maybe even
>> share some perspectives since i am honestly not an expert on
>> dimensional data representation.
>>
>> (I guess i need to add some query shell so people can try it out more easily.. )
>>
>> On Mon, Nov 28, 2011 at 1:55 AM, Steinmaurer Thomas
>> <Thomas.Steinmaurer@scch.at> wrote:
>>> Hello,
>>>
>>>
>>>
>>> this has been already discussed a bit in the past, but I'm trying to
>>> refresh this thread as this is an important design issue in our HBase
>>> evaluation.
>>>
>>>
>>>
>>> Basically, the result of our evaluation was that we gonna be happy with
>>> what Hadoop/HBase offers for managing our measurement/sensor data.
>>> Although one crucial thing for e.g. backend analysis tasks is, we need
>>> access to aggregated data very quickly. The idea is to run a MapReduce
>>> job and store the dialy aggregates in a RDBMS, which allows us to access
>>> aggregated data more easily via different tools (BI frontends etc.).
>>> Monthly and yearly aggregates are then handled with RDBMS concepts like
>>> Materialized Views and Partitioning.
>>>
>>>
>>>
>>> While it is an option processing the entire HBase table e.g. every night
>>> when we go live, it probably isn't an option when data volume grows over
>>> the years. So, what options are there for some kind of incremental
>>> aggregating only new data?
>>>
>>>
>>>
>>> - Perhaps using versioning (internal timestamp) might be an option?
>>>
>>> - Perhaps having some kind of HBase (daily) staging table which is
>>> truncated after aggregating data is an option?
>>>
>>> - How could Co-processors help here (at the time of the Go-Live, they
>>> might be available in e.g. Cloudera)?
>>>
>>>
>>>
>>> etc.
>>>
>>>
>>>
>>> Any ideas/comments are appreciated.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Thomas
>>>
>>>
>>>

Mime
View raw message