hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bradford Stephens <bradfordsteph...@gmail.com>
Subject Re: Handling Interactive versus Batch Calculations
Date Tue, 02 Mar 2010 04:36:36 GMT
Hey Nenshad --

I think Jonathan Gray began working on something similar to this a few
months ago for Streamy.

As JD said, Coprocessors are very interesting, and I think they're
worth looking at (or contributing a patch fo!) if you basically need
to use HBase as a "Giant Spreadsheet". Such as:
(Row,Column)->Value->Result. Building the functionality is a
considerable task, so I don't think you'll see it in a release from
the main contributors soon. I could be wrong.

If you need to do a real-time query/calculation on a certain subset of
data, that's where our platform may help. Such as "Sum of all
transactions where UserName=Jimmy and ZipCode=98104".

I'd be happy to talk more about Coprocessors if you want more details :)


On Sun, Feb 28, 2010 at 11:56 AM, Nenshad Bardoliwalla
<nenshad@gmail.com> wrote:
> Hello All,
> This is my first message to the list, so please feel free to refer me to
> other posts, blogs, etc. to get me up to speed.  I understand that HBase and
> MapReduce work side-by-side to each other, that is, that they can feed each
> other data.  I have two sets of use cases for my application: one which
> requires batch style calculations in parallel, which MapReduce is perfect
> for, and one which requires interactive calculations, which I'm not sure how
> to accomplish in HBase.  By interactive calculation, I mean that a user
> makes a request to HBase which requires some data transformation of the data
> in HDFS (say an aggregation or an allocation) and wants the results returned
> immediately.  Here are my questions:
> 1.  What is the mechanism by which you can build your own calculations that
> return results quickly in HBase?  Is it just Java classes or some other
> technique.
> 2.  For these types of calculations, does HBase handle acquiring the data if
> its distributed across multiple boxes like MapReduce does, or do I have to
> write my own algorithms that seek out the data on the write nodes?
> 3.  Is it possible to break-up the work across multiple nodes and then bring
> it together like a MapReduce, but without the performance penalty of using
> the MapReduce framework?  In other words, if HBase knows that files A-D are
> on node 1, E-G are on node 2, can I write a function that says "sum up X on
> node 1 locally and y on node 2 locally" and bring it back to me combined?
> 4.  Are there ways to guarantee that the computation will happen in-memory
> on the local column store, or is this the only place that such calculations
> happen?
> Apologies for what must be very basic questions.  Any pointers really
> appreciated.  Thank you.
> Best Regards,
> Nenshad
> --
> Nenshad D. Bardoliwalla
> Twitter: http://twitter.com/nenshad
> Book: http://www.driventoperform.net
> Blog: http://bardoli.blogspot.com

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

View raw message