Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of dlieu.7@gmail.com designates
 209.85.220.169 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAPud8Tqrxy-nQinN6NrVP6g1dfAdyopgiSgQ_ogyW518fbUG2g@mail.gmail.com>
References: <33123313.post@talk.nabble.com>
	<CAPud8Tqrxy-nQinN6NrVP6g1dfAdyopgiSgQ_ogyW518fbUG2g@mail.gmail.com>
Date: Wed, 11 Jan 2012 11:52:17 -0800
Message-ID: 
 <CAPud8TrK6x=_B-NrPNGpoQYk=-8uQvgU-5BeFNj37Ts6PZuJUg@mail.gmail.com>
Subject: Re: HBase for ad-hoc aggregate queries
From: Dmitriy Lyubimov <dlieu.7@gmail.com>
To: user@hbase.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Bottom line, imo you have to consider how your data is organized. for
90% of relational schema (but perhaps 10% of volume) the move to hbase
based solutions is not warranted.

However, for 10% of the schema (and 90% of the volume) you may
consider using HBase-based solutions. Most typically time series data
feeds.

-d

On Wed, Jan 11, 2012 at 11:48 AM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrot=
e:
> IMO You will never get the same flexibility. There are also numerous
> differences in data modelling approach (TTL, uniformly-distributed ids
> requirement to scale query volume, etc.)
>
> The most flexibility in that regard we reached so far w.r.t.
> aggregation queries is OLAPish model (see link on HBase wiki,
> supported projects, HBase-Lattice).
>
> This is for aggregating really high qps =A0RT fact streams and the list
> of current limitations is huge but it serves our purpose so far.
>
> Most obvious benefits are that queries are fast (because of
> precomputed cuboids in a lattice, similar to cuboid lattice approach
> in ROLAP), short incremental compilation cycle (one can grow and
> update the cube in just a few minutes after the fact got fed into
> system), and one can scale compilation horizontally for high volume
> fact feeds. There's a fairly limited query language and a basic set of
> aggregate functions (along with some weighted time series aggregates
> as well).
>
> Most severe limitation right now is lack of commonly used
> multidimensional query dialect such as MDX which prevents use of the
> widely used UI pivoting exploratory clients such as excel or JPivot or
> Tableau etc. So it is either custom UI integration or custom data
> source providers for canned reports with tools like pentaho and
> jasper, or some RT decisioning framework that doesn't require any UI
> at all and can use java API. I also plan to enable R to run queries
> against it (cause i personally don't beleive in doing ml or analytics
> using Excel).
>
> -d
>
> On Wed, Jan 11, 2012 at 10:59 AM, kfarmer <kfarmer@camstar.com> wrote:
>>
>> I'm taking a look at moving our datastore from Oracle to HBase, and tryi=
ng to
>> understand how HBase could be used for ad-hoc aggregation queries across=
 our
>> data.
>>
>> My understanding is MapReduce is more of a batch framework, so if we wan=
t a
>> query to come back to the user's request in a few seconds, that won't wo=
rk
>> because of the overheard of running MR and because the MR jobs write bac=
k to
>> a new table. =A0Is that correct?
>>
>> Instead should we be pre-aggregating data as we load into separate table=
s,
>> and then when a user queries instead just do a scan on these pre-aggrega=
ted
>> tables?
>>
>> Thanks.
>> --
>> View this message in context: http://old.nabble.com/HBase-for-ad-hoc-agg=
regate-queries-tp33123313p33123313.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>