Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 33202 invoked from network); 21 Jan 2010 16:18:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 21 Jan 2010 16:18:08 -0000 Received: (qmail 87357 invoked by uid 500); 21 Jan 2010 16:18:06 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 87309 invoked by uid 500); 21 Jan 2010 16:18:06 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 87299 invoked by uid 99); 21 Jan 2010 16:18:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jan 2010 16:18:06 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of edlinuxguru@gmail.com designates 216.239.58.185 as permitted sender) Received: from [216.239.58.185] (HELO gv-out-0910.google.com) (216.239.58.185) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jan 2010 16:17:56 +0000 Received: by gv-out-0910.google.com with SMTP id c17so64477gvd.4 for ; Thu, 21 Jan 2010 08:17:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=QPmzJT+WUBIdwCpyotVFDtTT39bYw89jeimEpMaZeyg=; b=R3FFa+ZZA6FrXCJ3uo2B5Fw5NyJsStOSiLsw5Hl78ejKa+JLhl26t4XpmniW64Is6S 33ih+zgdnajpPDAcQa2Y4r+fqAS+kHXE++LnzK+PhhN6vtEdfnV4w4BSM+hqwCyVx1xx /bBpNOD46/e74z+JZH+25I2FRWhiy/YZFAV5c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=WwxVc+qzJyvgI8hYMKhqwHpQ2boobAu451b1ma0+jscRJ2gYz6axEqhZIAyTOzR/DE gdf+bCD+E4ukh1qKD6XTCMArroeJS5fRWihTmvpIPYAe/yA47qrUh9uuYVXHPGKhumPB s7WwrNt1BPSZoUbKBpi/D4cRFEeoNcX4vRTuQ= MIME-Version: 1.0 Received: by 10.239.142.10 with SMTP id e10mr174279hba.82.1264090655751; Thu, 21 Jan 2010 08:17:35 -0800 (PST) In-Reply-To: <7c457ebe1001202209i320cfde6sfb0b6cc881aaaf8@mail.gmail.com> References: <27252203.post@talk.nabble.com> <7c457ebe1001202209i320cfde6sfb0b6cc881aaaf8@mail.gmail.com> Date: Thu, 21 Jan 2010 11:17:35 -0500 Message-ID: Subject: Re: learning hbase - schema design advice From: Edward Capriolo To: hbase-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Thu, Jan 21, 2010 at 1:09 AM, Dan Washusen wrote: > Have you read the bigtable paper linked off the front page of HBase? =A0I= t > does a good job of explaining the concepts. =A0Basically it's a distribut= ed > sorted map (think java.util.NavigableMap but split over many machines). = =A0If > you know the key of the row you are looking for HBase can fetch it very > quickly. =A0If you don't know the key you'll have to resort to scanning a= ll > the rows to find the data you are interested in (just like a SQL query th= at > can't take advantage of an index)... > > Do the queries need to immediately reflect any writes or is it sufficient > for them to become eventually consistent? =A0If you can live with eventua= l > consistency then you could write some map reduce jobs that duplicate a > master table into reporting tables (like you would for data > warehousing/reporting on a RDMS). > > I'm sure some of the more experienced users will have more insight but th= at > might get you started... > > Cheers, > Dan > > p.s. bold text doesn't seem to come through the mailing list... > > 2010/1/21 canucks > >> >> Hi, >> >> i'm pretty interested in learning hbase. =A0what i want to do is store >> financial data for analytical/graphing/displaying purposes. =A0there hun= dreds >> of millions of rows and of course, i want fast response when retrieving = the >> data. >> >> if i were to do it in a RDBMS it would be >> REPORT, MARKET, OPERATING_DATE, OPERATING_INTERVAL, =A0 =A0 HOUR_ENDING >> VALUE >> where the bolded column name are PK. =A0if i were to store this in hbase >> would >> it look like this? >> >> REPORT.MARKET.OPERATING_DATE.OPERATING_INTERVAL.HOUR_ENDING.TIMESTAMP{ >> =A0 =A0 =A0 =A0VALUE: 92.29 >> } >> >> so that i can do queries like below: >> - give me all reports with the name of "ABC" >> - give me all the values where OPERATING_DATE is from jan-01-2010 to >> jan-10-2010 >> - give me all the values where OPERATING_DATE is from jan-01-2010 to >> jan-10-2010 and HOUR_ENDING is between 5 and 10 (or simply 5 or variatio= ns >> thereof) >> >> in short, is hbase the wrong way to go about it or would it yield better >> performance? =A0also, you folks happen to know any good links/articles o= n >> hbase table & schema? >> >> thanks >> -- >> View this message in context: >> http://old.nabble.com/learning-hbase---schema-design-advice-tp27252203p2= 7252203.html >> Sent from the HBase User mailing list archive at Nabble.com. >> >> > I went looking for a paper "how to convert my RDBMS mindset to a key-value store midset" Here is something that got me started. http://s-expressions.com/2009/03/08/hbase-on-designing-schemas-for-column-o= riented-data-stores/