Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3B9BB79EE for ; Wed, 21 Dec 2011 08:17:00 +0000 (UTC) Received: (qmail 91097 invoked by uid 500); 21 Dec 2011 08:16:58 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 91055 invoked by uid 500); 21 Dec 2011 08:16:58 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 91046 invoked by uid 99); 21 Dec 2011 08:16:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Dec 2011 08:16:57 +0000 X-ASF-Spam-Status: No, hits=0.1 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dlieu.7@gmail.com designates 209.85.220.169 as permitted sender) Received: from [209.85.220.169] (HELO mail-vx0-f169.google.com) (209.85.220.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Dec 2011 08:16:50 +0000 Received: by vcge1 with SMTP id e1so6028776vcg.14 for ; Wed, 21 Dec 2011 00:16:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=OIfkqlrcnCmEEcD+mB8xtK1AHa7pKyP8BR/VxF+k34g=; b=SGzqfrdKAuXYqBj9s4FOEo65aXToxk5qZvkigj1f9ltx1OdOkrPQVuQxY/HxAY75pT CLG6Q8MRjwiZr9sumCRvY0XNrZL5ujnM5ZdlRCYBvcl4xpJKkqEZs5IyxpPCivrPVZJZ KWKFbr9t1HhzsTAlBa5Qj2HZ64qXbAepUsRNk= MIME-Version: 1.0 Received: by 10.220.155.142 with SMTP id s14mr4010558vcw.20.1324455388523; Wed, 21 Dec 2011 00:16:28 -0800 (PST) Received: by 10.52.100.228 with HTTP; Wed, 21 Dec 2011 00:16:28 -0800 (PST) In-Reply-To: References: <84B5E4309B3B9F4ABFF7664C3CD7698302D0DE41@kairo.scch.at> Date: Wed, 21 Dec 2011 00:16:28 -0800 Message-ID: Subject: Re: Strategies for aggregating data in a HBase table From: Dmitriy Lyubimov To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Also re: frontend is always a problem. so far we have a custom data source for this thing in jasper reports, but jdbc eventually is also possible. Looking to see what it takes to mount jpivot to it, but it is more serious endeavor so no big expectations there (unless i pick somebody willing to help there). On Wed, Dec 21, 2011 at 12:14 AM, Dmitriy Lyubimov wrote: > https://github.com/dlyubimov/HBase-Lattice > > On Wed, Dec 21, 2011 at 12:13 AM, Dmitriy Lyubimov wrote: >> Thomas, >> >> Sorry for shameless self-promotion. Can you look at our hbase-lattice >> project? it is incremental OLAP-ish cube compilation with custom >> filtering to optimize for composite key scans. Some rudimental query >> language as well. >> >> Bunch of standard (and not so standard) aggregates for measure data >> and ability to relatively easily add user aggregate thru model >> definiton. >> >> Very early stage. But see if it could fit your purpose, maybe even >> share some perspectives since i am honestly not an expert on >> dimensional data representation. >> >> (I guess i need to add some query shell so people can try it out more easily.. ) >> >> On Mon, Nov 28, 2011 at 1:55 AM, Steinmaurer Thomas >> wrote: >>> Hello, >>> >>> >>> >>> this has been already discussed a bit in the past, but I'm trying to >>> refresh this thread as this is an important design issue in our HBase >>> evaluation. >>> >>> >>> >>> Basically, the result of our evaluation was that we gonna be happy with >>> what Hadoop/HBase offers for managing our measurement/sensor data. >>> Although one crucial thing for e.g. backend analysis tasks is, we need >>> access to aggregated data very quickly. The idea is to run a MapReduce >>> job and store the dialy aggregates in a RDBMS, which allows us to access >>> aggregated data more easily via different tools (BI frontends etc.). >>> Monthly and yearly aggregates are then handled with RDBMS concepts like >>> Materialized Views and Partitioning. >>> >>> >>> >>> While it is an option processing the entire HBase table e.g. every night >>> when we go live, it probably isn't an option when data volume grows over >>> the years. So, what options are there for some kind of incremental >>> aggregating only new data? >>> >>> >>> >>> - Perhaps using versioning (internal timestamp) might be an option? >>> >>> - Perhaps having some kind of HBase (daily) staging table which is >>> truncated after aggregating data is an option? >>> >>> - How could Co-processors help here (at the time of the Go-Live, they >>> might be available in e.g. Cloudera)? >>> >>> >>> >>> etc. >>> >>> >>> >>> Any ideas/comments are appreciated. >>> >>> >>> >>> Thanks, >>> >>> Thomas >>> >>> >>>