Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2828DC4AE for ; Fri, 25 May 2012 17:31:58 +0000 (UTC) Received: (qmail 29869 invoked by uid 500); 25 May 2012 17:31:56 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 29813 invoked by uid 500); 25 May 2012 17:31:56 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 29801 invoked by uid 99); 25 May 2012 17:31:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 May 2012 17:31:56 +0000 X-ASF-Spam-Status: No, hits=-0.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dlieu.7@gmail.com designates 209.85.220.169 as permitted sender) Received: from [209.85.220.169] (HELO mail-vc0-f169.google.com) (209.85.220.169) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 May 2012 17:31:50 +0000 Received: by vcbfl10 with SMTP id fl10so795301vcb.14 for ; Fri, 25 May 2012 10:31:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=X+obgqD3yFcoTpLuJyJb4guhfhSyvpPGE1fwSDsyM3k=; b=SBwQliAi+g+aSyeAsbbg0TF9uPe84DurfYJAFr42zJBOHgl2b2yBTk9rD+w9F/MuY1 eBTOJj3e+nDbkNXUbehGLq5Y/XgQgjLZdJ6X90VvC/N2ao/9ZsGvmzxMKYYRT+wEeOYv uVG8RdafuFzHeXb3Jf5XBujTgbssRlb97H/1o1IC7mFLa6/Hu7BHSqFBWEPpChFaDBow r/LEJtOVeChKXmgmNgW4D/MzsFcS5lzvglkN6UH2Nwxui9J+bapdmNLNgxRc6ZUESl6V VfkVcw2mZznKnK8y+F/Mk3K7ZFXgpvJ4FcUccEC4yFbL+Yzy55RTg+u/4JjY1TQgYIqy 3Psw== MIME-Version: 1.0 Received: by 10.220.150.14 with SMTP id w14mr4445149vcv.59.1337967089216; Fri, 25 May 2012 10:31:29 -0700 (PDT) Received: by 10.52.90.165 with HTTP; Fri, 25 May 2012 10:31:29 -0700 (PDT) Date: Fri, 25 May 2012 10:31:29 -0700 Message-ID: Subject: Of hbase key distribution and query scalability, again. From: Dmitriy Lyubimov To: user Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Hello, I'd like to collect opinions from HBase experts on the query uniformity and whether there's any advance technique currently exists in HBase to cope with the problems of query uniformity beyond just maintaining the key uniform distribution. I know we start with the statement that in order to scale queries, we need them uniformly distributed over key space. The next advice people get is to use uniformly distributed key. Then, the thinking goes, the query load will also be uniformly distributed among regions. For what seems to be an embarassingly long time i was missing the point however that using uniformly distributed keys does not equate uniform distribution of the queries since it doesn't account for skewness of queries over the key space itself. This skewness can be bad enough under some circumstances to create query hot spots in the cluster which could have been avoided should region splits were balanced based on query loads rather than on a data size per se. (sort of dynamic query distribution sampling in order to equalize the load similar to how TotalOrderPartitioner does random data sampling to build distribution of the key skewness in the incoming data). To cut a long story, is the region size the only current HBase technique to balance load, esp. w.r.t query load? Or perhaps there are some more advanced techniques to do that ? Thank you very much. -Dmitriy