Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2E502F34B for ; Fri, 5 Jul 2013 13:54:20 +0000 (UTC) Received: (qmail 11493 invoked by uid 500); 5 Jul 2013 13:54:18 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 11144 invoked by uid 500); 5 Jul 2013 13:54:15 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 11136 invoked by uid 99); 5 Jul 2013 13:54:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Jul 2013 13:54:14 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of aji1705@gmail.com designates 209.85.217.173 as permitted sender) Received: from [209.85.217.173] (HELO mail-lb0-f173.google.com) (209.85.217.173) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Jul 2013 13:54:08 +0000 Received: by mail-lb0-f173.google.com with SMTP id v1so2058535lbd.32 for ; Fri, 05 Jul 2013 06:53:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=LNcX2eNo9KPs/zC2IU7c0lLslGAZMKZGt01i3O8Cw/g=; b=PS2oQGnfZk+opu3W8s2c1IWmfBxzyySUY//ylEPOy412CLHMcuWP3xxqMc3Kn/cY7d 3KATcJiR7qXr4tw55ar37iBwuSsOGkczL6NcLaCGzG58uwsmwSEr6u1irSj8uDoajz41 PoUSVylS9xtJgfycNrutuSdjTN5OSLCgFMiVwjwxymY5F2rmU9Ugx7hPiWoJgWA4mfFb cAoIzyUnSyUPzz26H69s8Lm4ra7EU8GyTj5X3tUKSSa3CwVZgdMSAz6zeuSmmHf0elND 3rwXCOJ7EX9siqaqYNI4sHG9UgLAlyexcFlwKdUV3nnArVMoC6GO4wiw9vG6gSVEtFjN raHQ== MIME-Version: 1.0 X-Received: by 10.152.88.105 with SMTP id bf9mr4961506lab.38.1373032427662; Fri, 05 Jul 2013 06:53:47 -0700 (PDT) Received: by 10.112.7.193 with HTTP; Fri, 5 Jul 2013 06:53:47 -0700 (PDT) In-Reply-To: References: <1372773370395.61eb8b2b@Nodemailer> Date: Fri, 5 Jul 2013 09:53:47 -0400 Message-ID: Subject: Re: When to expand vertically vs. horizontally in Hbase From: Aji Janis To: user Content-Type: multipart/alternative; boundary=001a11c367221db25704e0c406fb X-Virus-Checked: Checked by ClamAV on apache.org --001a11c367221db25704e0c406fb Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Asaf, I am using the Genre/Author stuff as an example but yes at the moment I only have 5 column families. However, over time I may have more (no upper limit decided that this point). See below for more responses On Wed, Jul 3, 2013 at 3:42 PM, Asaf Mesika wrote: > Do you have only 5 static author names? > Keep in mind the column family name is defined when creating the table. > > Regarding tall vs wide debate: > HBase is first and for most a Key Value database thus reads and writes in > the column-value level. So it doesn't really care about rows. > But it's not entirely true. Rows come into play in the following > situations: > Splitting a region is per row and not per column, thus a row will be save= d > as a whole on a region. If you have a really large row, the region size > granularity is dependent on it. It doesn't seem to be the case here. > Put/Delete creates a lock until finished. If you are intensive on inserts > to the same row at the same time, thus might be bad for you, keeping your > rows slimmer can reduce contention, but again, only if you make a lot > concurrent modifications to the same row. > I expect batches of Put/Delete to the same row to happen by at most one thread at a time based on user's current behavior. So locking shouldn't be an issue. However, not sure if the saving row to a region with enough space topic is really an issue I need to worry about (probably because I just don't know much about it yet). > Filtering - if you need a filter which need all the row (there is a metho= d > you override in Filter to mark that) than a far row will be more memory > intensive. If you needed only 1/5 of your row, than maybe splitting it to= 5 > rows to begin with would have made a better schema design in terms of > memory and I/O. > Currently, my access pattern is to get all data for a given row. Its possible in the future we may want to apply (family/qualifier) filters. There is a lot of uncertainty on use cases (client side) at this point which is why I am not entirely sure on how things will look months from now. I am not sure I follow this statement "if you need a filter which need all the row (there is a method you override in Filter to mark that) than a far row will be more memory intensive." Can you please explain? Thank you for these suggestions btw, good food for thought! > > On Wednesday, July 3, 2013, Aji Janis wrote: > > > I have a major typo in the question so I apologize. I meant to say 5 > > families with 1000+ qualifiers each. > > > > Lets work with an example, (not the greatest example here but still). > Lets > > say we have a Genre Class like this: > > > > Class HistoryBooks{ > > > > ArrayList author1; > > ArrayList author2; > > ArrayList author3; > > ArrayList author4; > > ArrayList author5; > > > > ...} > > > > Each author is a column family (lets say we only allow 5 authors per > > Book class. Book per author ends up being the qualifier. In this > case, I > > know I have a max family count but my qualifiers have no upper limit. S= o > is > > this scenario a case for tall or wide table? Why? Thank you. > > > > > > On Tue, Jul 2, 2013 at 9:56 AM, Bryan Beaudreault > > >wrote: > > > > > If they are accessed mostly together they should all be a single colu= mn > > > family. The key with tall or wide is based on the total byte size of > each > > > KeyValue. Your cells would need to be quite large for 50 to become a > > > problem. I still would recommend using a single CF though. > > > =97 > > > Sent from iPhone > --001a11c367221db25704e0c406fb--