Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 523F818ABE for ; Fri, 18 Dec 2015 09:56:06 +0000 (UTC) Received: (qmail 60849 invoked by uid 500); 18 Dec 2015 09:56:04 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 60788 invoked by uid 500); 18 Dec 2015 09:56:04 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 60776 invoked by uid 99); 18 Dec 2015 09:56:03 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Dec 2015 09:56:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 625A2C06B9 for ; Fri, 18 Dec 2015 09:56:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.88 X-Spam-Level: ** X-Spam-Status: No, score=2.88 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id aw1EVgAtYfSG for ; Fri, 18 Dec 2015 09:55:57 +0000 (UTC) Received: from mail-yk0-f180.google.com (mail-yk0-f180.google.com [209.85.160.180]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 2DFD6203F5 for ; Fri, 18 Dec 2015 09:55:57 +0000 (UTC) Received: by mail-yk0-f180.google.com with SMTP id 140so51835164ykp.0 for ; Fri, 18 Dec 2015 01:55:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=cheuFdNWKvIymnutLNgq0dg2itpygJ2emLueWtdMFW8=; b=VGEVvJcnbSTkke9bScsZwDWtppmDVx4va7dlHaHdCy0uSt7ZAa3HOokBljo7vZcOsm vhMg1NURy/u4XTy+YSKQcztAQRpaiZV628c/LhwgcXQa6EdyiqGqQmKYEkMG3XZngSaH ubJbCeayK+vBphXHiz6HM5B84WaufsuMFAgMhbKKDr0+QLEJHYsbY8oyKNLJRvAknmF6 vndSrLqAjoxQdbWMWos3GsztW7Vad2hmKlu+CeUEJx/IGPJQ0dVI1nx6zlzAz4HXopaW 1ePW5R76FQX4OI06Bxh4Bux8SEoRH4xc1ihyEZ2m0nVF3hhk//IUnWBMONuSeGgbjDrK m+Qg== MIME-Version: 1.0 X-Received: by 10.129.117.84 with SMTP id q81mr2061848ywc.190.1450432550574; Fri, 18 Dec 2015 01:55:50 -0800 (PST) Received: by 10.129.70.197 with HTTP; Fri, 18 Dec 2015 01:55:50 -0800 (PST) Date: Fri, 18 Dec 2015 10:55:50 +0100 Message-ID: Subject: Adding RegionServers when salting From: Marko Dinic To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=001a1147f5c8f2a4930527292402 --001a1147f5c8f2a4930527292402 Content-Type: text/plain; charset=UTF-8 Hi everyone, I read about salting and how it is used for load balancing in case of sequential keys. Basically, salt should distribute sequential rows to different region servers. I also read this article which explains how to run MR jobs on tables which were salted. So, it advised to generate salt as: StringUtils.leftPad(Integer.toString(Math.abs(keyCore.hashCode() % numberOfRegions)), 3, "0") + "|" + logicalKey So you basically take hash of original key and do modulo division to get the salt. You also need to specify pre-splitting based on the salt, so that each region would contain rows with same salt. All of this seems reasonable. My question is, *what happens when you add more region servers*? It is expected that you also increase number of regions so you would have to change split strategy so that new regions follow the "one-salt-for-all-rows-in-region" rule. You would also need to perform modulo division by an increased numberOfRegions. All of that means that I could *mess up* queries when trying to get rows which were added when number of regions is smaller. For example, at the beginning you could be dividing by modulo 10 (10 regions), and then you would be dividing modulo 50 (now, 50 regions). Can anyone please explain the full procedure to this salting/pre-splitting properly? -- Marko Dinic --001a1147f5c8f2a4930527292402--