Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 918587DBA for ; Thu, 25 Aug 2011 14:53:37 +0000 (UTC) Received: (qmail 54474 invoked by uid 500); 25 Aug 2011 14:53:36 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 54210 invoked by uid 500); 25 Aug 2011 14:53:35 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 54202 invoked by uid 99); 25 Aug 2011 14:53:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Aug 2011 14:53:35 +0000 X-ASF-Spam-Status: No, hits=1.6 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rmorgan466@gmail.com designates 209.85.160.169 as permitted sender) Received: from [209.85.160.169] (HELO mail-gy0-f169.google.com) (209.85.160.169) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Aug 2011 14:53:28 +0000 Received: by gyg10 with SMTP id 10so2175689gyg.14 for ; Thu, 25 Aug 2011 07:53:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=4SIm+f5yu9o5TsKN853UJyhVsUT2OVftA9eSb0KDqrc=; b=v7nNwsjfS8pyY4fxmzScg0d8aWNU3zAtGMHtVIPmh8DpqYzvWiTjRHO6io2td+p7qm nR8EML5HWDYbGUjj1OQI/OjlB/eRuGL1rAwCP+I7FGP/PYXTiZgcVm7ypKxLiACm1VdH 4HeSMOX79cGsCptsHOHFpFwtv73oJEKcWkjhI= MIME-Version: 1.0 Received: by 10.236.78.200 with SMTP id g48mr41695912yhe.12.1314283987567; Thu, 25 Aug 2011 07:53:07 -0700 (PDT) Received: by 10.236.69.165 with HTTP; Thu, 25 Aug 2011 07:53:07 -0700 (PDT) In-Reply-To: References: Date: Thu, 25 Aug 2011 10:53:07 -0400 Message-ID: Subject: Re: schema help From: Rita To: Ian Varley Cc: user@hbase.apache.org Content-Type: multipart/alternative; boundary=20cf300fb30736834e04ab5596b5 X-Virus-Checked: Checked by ClamAV on apache.org --20cf300fb30736834e04ab5596b5 Content-Type: text/plain; charset=ISO-8859-1 Thanks for your reponse. 30 million rows is the best case :-) Couple of questions about doing, [fieldA][time] as my key: Would I have to insert in order? If no, how would hbase know to stop scanning the entire table? How would a query actually look like, if my key was [fieldA time]? As a matter of fact, I can do 100% of my queries. I will leave the 5% out of my project/schema. On Thu, Aug 25, 2011 at 10:13 AM, Ian Varley wrote: > Rita, > > There's no need to create separate tables here--the table is really just a > "namespace" for keys. A better option would probably be having one table > with "[fieldA][time]" (the two fields concatenated) as your row key. Then, > you can seek directly to the start of your records in constant time, and > then scan forward until you get to the end of the data (linear time in the > size of data you expect to get back). > > The downside of this is that for the 5% of your queries that aren't in this > form, you may have to do a full table scan. (Alternately, you could also > maintain secondary indexes that help you get the data back with less than a > full table scan; that would depend on the nature of the queries). > > In general, a good rule of thumb when designing a schema in HBase is, think > first about how you'd ideally like to access the data. Then structure the > data to match that access pattern. (This is obviously not ideal if you have > lots of different access patterns, but then, that's what relational > databases are for. Most commercial relational DBs wouldn't blink at doing > analytical queries against 30 million rows.) > > Ian > > On Aug 25, 2011, at 9:03 AM, Rita wrote: > > Hello, > > I am trying to solve a time related problem. I can certainly use opentsdb > for this but was wondering if anyone had a clever way to create this type > of > schema. > > I have an inventory table, > > time (unix epoch), fieldA, fieldB, data > > > There are about 30 million of these entries. > > 95% of my queries will look like this: > show me where fieldA=zCORE from range [1314180693 to now] > > for fieldA, there is a possibility of 4000 unique items. > for fieldB, there is a possibility of 2 unique items (bool). > > So, I was thinking of creating 4000*2 tables and place the data like that > so > I can easly scan. > > Any thoughts about this? Will hbase freak out if i have 8000 tables? > > > > > > > -- > --- Get your facts first, then you can distort them as you please.-- > > > -- --- Get your facts first, then you can distort them as you please.-- --20cf300fb30736834e04ab5596b5--