Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 34EC29310 for ; Wed, 11 Jan 2012 19:52:46 +0000 (UTC) Received: (qmail 87973 invoked by uid 500); 11 Jan 2012 19:52:44 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 87872 invoked by uid 500); 11 Jan 2012 19:52:43 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 87864 invoked by uid 99); 11 Jan 2012 19:52:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jan 2012 19:52:43 +0000 X-ASF-Spam-Status: No, hits=-0.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dlieu.7@gmail.com designates 209.85.220.169 as permitted sender) Received: from [209.85.220.169] (HELO mail-vx0-f169.google.com) (209.85.220.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jan 2012 19:52:38 +0000 Received: by vcge1 with SMTP id e1so1020934vcg.14 for ; Wed, 11 Jan 2012 11:52:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=IW+P4dIzF2UEDqEzsZVcB3DHBORX9ZlBFVCRQtLZM6I=; b=t4e1GqYswcFEY9yyjAYfHX2DRnp/29SvnG9Cvl63K4H87HvSC4+VbMqWvpMNUUP+Mh qVmXzCJ98Yo7mWBkKcWtD+kuDz/PEDP0Wfqniqj7cAylgKNeUwDEDC+0q2q10Fc4PiwP kuHj08ywGc/yPTYK6MbUQjtCCSnN8XYIinh5w= MIME-Version: 1.0 Received: by 10.220.226.7 with SMTP id iu7mr352848vcb.72.1326311537917; Wed, 11 Jan 2012 11:52:17 -0800 (PST) Received: by 10.52.74.165 with HTTP; Wed, 11 Jan 2012 11:52:17 -0800 (PST) In-Reply-To: References: <33123313.post@talk.nabble.com> Date: Wed, 11 Jan 2012 11:52:17 -0800 Message-ID: Subject: Re: HBase for ad-hoc aggregate queries From: Dmitriy Lyubimov To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Bottom line, imo you have to consider how your data is organized. for 90% of relational schema (but perhaps 10% of volume) the move to hbase based solutions is not warranted. However, for 10% of the schema (and 90% of the volume) you may consider using HBase-based solutions. Most typically time series data feeds. -d On Wed, Jan 11, 2012 at 11:48 AM, Dmitriy Lyubimov wrot= e: > IMO You will never get the same flexibility. There are also numerous > differences in data modelling approach (TTL, uniformly-distributed ids > requirement to scale query volume, etc.) > > The most flexibility in that regard we reached so far w.r.t. > aggregation queries is OLAPish model (see link on HBase wiki, > supported projects, HBase-Lattice). > > This is for aggregating really high qps =A0RT fact streams and the list > of current limitations is huge but it serves our purpose so far. > > Most obvious benefits are that queries are fast (because of > precomputed cuboids in a lattice, similar to cuboid lattice approach > in ROLAP), short incremental compilation cycle (one can grow and > update the cube in just a few minutes after the fact got fed into > system), and one can scale compilation horizontally for high volume > fact feeds. There's a fairly limited query language and a basic set of > aggregate functions (along with some weighted time series aggregates > as well). > > Most severe limitation right now is lack of commonly used > multidimensional query dialect such as MDX which prevents use of the > widely used UI pivoting exploratory clients such as excel or JPivot or > Tableau etc. So it is either custom UI integration or custom data > source providers for canned reports with tools like pentaho and > jasper, or some RT decisioning framework that doesn't require any UI > at all and can use java API. I also plan to enable R to run queries > against it (cause i personally don't beleive in doing ml or analytics > using Excel). > > -d > > On Wed, Jan 11, 2012 at 10:59 AM, kfarmer wrote: >> >> I'm taking a look at moving our datastore from Oracle to HBase, and tryi= ng to >> understand how HBase could be used for ad-hoc aggregation queries across= our >> data. >> >> My understanding is MapReduce is more of a batch framework, so if we wan= t a >> query to come back to the user's request in a few seconds, that won't wo= rk >> because of the overheard of running MR and because the MR jobs write bac= k to >> a new table. =A0Is that correct? >> >> Instead should we be pre-aggregating data as we load into separate table= s, >> and then when a user queries instead just do a scan on these pre-aggrega= ted >> tables? >> >> Thanks. >> -- >> View this message in context: http://old.nabble.com/HBase-for-ad-hoc-agg= regate-queries-tp33123313p33123313.html >> Sent from the HBase User mailing list archive at Nabble.com. >>