Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 36214 invoked from network); 2 Mar 2010 04:37:11 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Mar 2010 04:37:11 -0000 Received: (qmail 42307 invoked by uid 500); 2 Mar 2010 04:37:08 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 42166 invoked by uid 500); 2 Mar 2010 04:37:08 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 42156 invoked by uid 99); 2 Mar 2010 04:37:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Mar 2010 04:37:08 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of bradfordstephens@gmail.com designates 74.125.83.48 as permitted sender) Received: from [74.125.83.48] (HELO mail-gw0-f48.google.com) (74.125.83.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Mar 2010 04:36:58 +0000 Received: by gwaa11 with SMTP id a11so77147gwa.35 for ; Mon, 01 Mar 2010 20:36:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=FlHOQVRDxZ+sAiKY8EUD1b0uGTr3TVRHs5PDPeZk8hA=; b=GiRyRKwBy57ZYv9VAXosGqBXTRkeS0CVzQ3AsnC+XDFOL6y9mXDltbXXrav2zFzY7i v2F7elM7QmSfMI8l8hUjTXOFWqNSeY7ZEK6+LtlF1npeJLXCTuhLqGcTttuNxLsCdFXN CGNEPIN81IeiH1UDcj4zzyneoA5qyR+NW9pIc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=Mxrc5G4KMzWN9/ox5afukWSexXAzxLBV8sapXv52TEbZm6C1ZfojriZhbHxQsFtykP j4JEjEgcekI5VBuxRFne0COxfyI9Pt+pTAzw8M52zm6PnpOGcbWM5l0qdtKqHGEUHnOz v4FEG95ZLRGu/pHzsah0SntSbF7T3ynyBhRfM= MIME-Version: 1.0 Received: by 10.91.164.27 with SMTP id r27mr4729495ago.20.1267504596849; Mon, 01 Mar 2010 20:36:36 -0800 (PST) In-Reply-To: <6c89f6801002281156m8a02642hb219a6e0bb350ba9@mail.gmail.com> References: <6c89f6801002281156m8a02642hb219a6e0bb350ba9@mail.gmail.com> Date: Mon, 1 Mar 2010 20:36:36 -0800 Message-ID: <860544ed1003012036j3f83483dh688732894ea21c4@mail.gmail.com> Subject: Re: Handling Interactive versus Batch Calculations From: Bradford Stephens To: hbase-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hey Nenshad -- I think Jonathan Gray began working on something similar to this a few months ago for Streamy. As JD said, Coprocessors are very interesting, and I think they're worth looking at (or contributing a patch fo!) if you basically need to use HBase as a "Giant Spreadsheet". Such as: (Row,Column)->Value->Result. Building the functionality is a considerable task, so I don't think you'll see it in a release from the main contributors soon. I could be wrong. If you need to do a real-time query/calculation on a certain subset of data, that's where our platform may help. Such as "Sum of all transactions where UserName=3DJimmy and ZipCode=3D98104". I'd be happy to talk more about Coprocessors if you want more details :) Cheers, Bradford On Sun, Feb 28, 2010 at 11:56 AM, Nenshad Bardoliwalla wrote: > Hello All, > > This is my first message to the list, so please feel free to refer me to > other posts, blogs, etc. to get me up to speed. =A0I understand that HBas= e and > MapReduce work side-by-side to each other, that is, that they can feed ea= ch > other data. =A0I have two sets of use cases for my application: one which > requires batch style calculations in parallel, which MapReduce is perfect > for, and one which requires interactive calculations, which I'm not sure = how > to accomplish in HBase. =A0By interactive calculation, I mean that a user > makes a request to HBase which requires some data transformation of the d= ata > in HDFS (say an aggregation or an allocation) and wants the results retur= ned > immediately. =A0Here are my questions: > > 1. =A0What is the mechanism by which you can build your own calculations = that > return results quickly in HBase? =A0Is it just Java classes or some other > technique. > 2. =A0For these types of calculations, does HBase handle acquiring the da= ta if > its distributed across multiple boxes like MapReduce does, or do I have t= o > write my own algorithms that seek out the data on the write nodes? > 3. =A0Is it possible to break-up the work across multiple nodes and then = bring > it together like a MapReduce, but without the performance penalty of usin= g > the MapReduce framework? =A0In other words, if HBase knows that files A-D= are > on node 1, E-G are on node 2, can I write a function that says "sum up X = on > node 1 locally and y on node 2 locally" and bring it back to me combined? > 4. =A0Are there ways to guarantee that the computation will happen in-mem= ory > on the local column store, or is this the only place that such calculatio= ns > happen? > > Apologies for what must be very basic questions. =A0Any pointers really > appreciated. =A0Thank you. > > Best Regards, > > Nenshad > > -- > Nenshad D. Bardoliwalla > Twitter: http://twitter.com/nenshad > Book: http://www.driventoperform.net > Blog: http://bardoli.blogspot.com > --=20 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science