Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7CDFD10E99 for ; Wed, 30 Apr 2014 03:33:30 +0000 (UTC) Received: (qmail 41334 invoked by uid 500); 30 Apr 2014 03:33:27 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 41031 invoked by uid 500); 30 Apr 2014 03:33:26 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 41011 invoked by uid 99); 30 Apr 2014 03:33:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Apr 2014 03:33:25 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,LOTS_OF_MONEY,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of prasanna.sreepathi@gmail.com designates 74.125.82.48 as permitted sender) Received: from [74.125.82.48] (HELO mail-wg0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Apr 2014 03:33:20 +0000 Received: by mail-wg0-f48.google.com with SMTP id b13so1101537wgh.19 for ; Tue, 29 Apr 2014 20:32:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=iGYB4gLRXbP5mSRAFshTG/BveGO4AFRps/L263k2LIo=; b=CawSzHYX0qhbOVOEfrwYTsKfZkm6PxV236ieuAtRCde0X7nhXNB5Z2nzF7olaUKdAW GSEs9RuZnrQCydGPLZGJzalIAu4/OVUXAdEaY6cImmj0FLyw+hjeLXBYivs+ZzFsuPPw c+MuIVPIKLCXQrr9LVHi5iVkskJLGkbrw3G4pGpfBcrVRj7bigMOEc5sX3996Yfo7pUr jN8PKqAFGPrr59NMdGNUowINwhf1o3IquvIHLcA7Agro/6GGujV+cK5vjSUBe4PwUHiE u4M6fX4THsM1OniuxtU0bs6rYLQVKa2jf27ZtNcCypSyxo7KJZnsU7NAJ4iIz6jNpgxC SV8Q== MIME-Version: 1.0 X-Received: by 10.180.39.178 with SMTP id q18mr1372983wik.56.1398828779121; Tue, 29 Apr 2014 20:32:59 -0700 (PDT) Received: by 10.227.204.74 with HTTP; Tue, 29 Apr 2014 20:32:59 -0700 (PDT) In-Reply-To: References: Date: Tue, 29 Apr 2014 20:32:59 -0700 Message-ID: Subject: Re: Help with row and column design From: Sreepathi To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=001a11c22a527b4f9d04f83a3456 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c22a527b4f9d04f83a3456 Content-Type: text/plain; charset=UTF-8 I guess you can pre-split tables manually which avoids hotspotting.. On Tue, Apr 29, 2014 at 8:08 PM, Software Dev wrote: > Any improvements in the row key design? > > If i always know we will be querying by country could/should I prefix > the row key with the country to help with hotspotting? > > FR/2014042901 > FR/2014042902 > .... > US/2014042901 > US/2014042902 > ... > > Is this preferred over adding it in a column... ie 2014042901:Country:US > > On Tue, Apr 29, 2014 at 8:05 PM, Software Dev > wrote: > > Ok didnt know if the sheer number of gets would be a limiting factor. > Thanks > > > > On Tue, Apr 29, 2014 at 7:57 PM, Ted Yu wrote: > >> As I said this afternoon: > >> See the following API in HTable for batching Get's : > >> > >> public Result[] get(List gets) throws IOException { > >> > >> Cheers > >> > >> > >> On Tue, Apr 29, 2014 at 7:45 PM, Software Dev < > static.void.dev@gmail.com>wrote: > >> > >>> Nothing against your code. I just meant that if we are doing a scan > >>> say for hourly metrics across a 6 month period we are talking about > >>> 4K+ gets. Is that something that can easily be handled? > >>> > >>> On Tue, Apr 29, 2014 at 5:08 PM, Rendon, Carlos (KBB) > > >>> wrote: > >>> >> Gets a bit hairy when doing say a shitload of gets thought.. no? > >>> > > >>> > If you by "hairy" you mean the code is ugly, it was written for > maximal > >>> clarity. > >>> > I think you'll find a few sensible loops makes it fairly clean. > >>> > Otherwise I'm not sure what you mean. > >>> > > >>> > -----Original Message----- > >>> > From: Software Dev [mailto:static.void.dev@gmail.com] > >>> > Sent: Tuesday, April 29, 2014 5:02 PM > >>> > To: user@hbase.apache.org > >>> > Subject: Re: Help with row and column design > >>> > > >>> >> Yes. See total_usa vs. total_female_usa above. Basically you have to > >>> pre-store every level of aggregation you care about. > >>> > > >>> > Ok I think this makes sense. Gets a bit hairy when doing say a > shitload > >>> of gets thought.. no? > >>> > > >>> > On Tue, Apr 29, 2014 at 4:43 PM, Rendon, Carlos (KBB) < > CRendon@kbb.com> > >>> wrote: > >>> >> You don't do a scan, you do a series of gets, which I believe you > can > >>> batch into one call. > >>> >> > >>> >> last 5 days query in pseudocode > >>> >> res1 = Get( hash("2014-04-29") + "2014-04-29") > >>> >> res2 = Get( hash("2014-04-28") + "2014-04-28") > >>> >> res3 = Get( hash("2014-04-27") + "2014-04-27") > >>> >> res4 = Get( hash("2014-04-26") + "2014-04-26") > >>> >> res5 = Get( hash("2014-04-25") + "2014-04-25") > >>> >> > >>> >> For each result you look for the particular column or columns you > are > >>> >> interested in Total_usa = res1.get("c:usa") + res2.get("c:usa") + > >>> res3.get("c:usa") + ... > >>> >> Total_female_usa = res1.get("c:usa:sex:f") + ... > >>> >> > >>> >> "What happens when we add more fields? Do we just keep adding in > more > >>> column qualifiers? If so, how would we filter across columns to get an > >>> aggregate total?" > >>> >> > >>> >> Yes. See total_usa vs. total_female_usa above. Basically you have to > >>> pre-store every level of aggregation you care about. > >>> >> > >>> >> -----Original Message----- > >>> >> From: Software Dev [mailto:static.void.dev@gmail.com] > >>> >> Sent: Tuesday, April 29, 2014 4:36 PM > >>> >> To: user@hbase.apache.org > >>> >> Subject: Re: Help with row and column design > >>> >> > >>> >>> The downside is it still has a hotspot when inserting, but when > >>> >>> reading a range of time it does not > >>> >> > >>> >> How can you do a scan query between dates when you hash the date? > >>> >> > >>> >>> Column qualifiers are just the collection of items you are > >>> >>> aggregating on. Values are increments. In your case qualifiers > might > >>> >>> look like c:usa, c:usa:sex:m, c:usa:sex:f, c:italy:sex:m, > >>> >>> c:italy:sex:f, c:italy, > >>> >> > >>> >> What happens when we add more fields? Do we just keep adding in more > >>> column qualifiers? If so, how would we filter across columns to get an > >>> aggregate total? > >>> > -- *Regards,* --- *Sreepathi * --001a11c22a527b4f9d04f83a3456--