Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1353ACABD for ; Wed, 3 Jul 2013 05:30:54 +0000 (UTC) Received: (qmail 91975 invoked by uid 500); 3 Jul 2013 05:30:50 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 91922 invoked by uid 500); 3 Jul 2013 05:30:49 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 91907 invoked by uid 99); 3 Jul 2013 05:30:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jul 2013 05:30:48 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=HTML_FONT_FACE_BAD,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of joarderm@gmail.com designates 209.85.220.181 as permitted sender) Received: from [209.85.220.181] (HELO mail-vc0-f181.google.com) (209.85.220.181) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jul 2013 05:30:41 +0000 Received: by mail-vc0-f181.google.com with SMTP id lf11so3293023vcb.40 for ; Tue, 02 Jul 2013 22:30:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=qpk0e/BYoWVpg1Lf6FeK5YNobWxJl4mfQpMM30kzSbI=; b=vInbKBKCrvyIbvYYOnAvSBJkyYz1sYDHcS68YNoBc2eHooKQqS71+LPBx2S2TFW76+ 4dOW0ay39lkeHsyJCIpzQpsl4kqbCBvTCHF61pzWzGn4B3YnuuvqAHlOpifRaQ5pYLkE xIGZjLtOG8LNAMNwy9L501W/fDfv9ZC3Z6hHdseTnHRyGcQ6IkQqFKUwTOKFSWX0ZeF7 BTmdN+qVfWwNeUqax9R5QM643x4Ke7f16/d5iwqZZXnRlg5fQ0f8fdiwWyTy2SITVdzx wdnvso0cKk/nvc0mwUeX2zI5Ganymdy7w/rFlmGMy/TeQnAjUuuIU3Rtf0wdj4q7tvaO fyXQ== X-Received: by 10.52.73.193 with SMTP id n1mr10385979vdv.71.1372829420628; Tue, 02 Jul 2013 22:30:20 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.61.140 with HTTP; Tue, 2 Jul 2013 22:29:40 -0700 (PDT) In-Reply-To: References: From: Joarder KAMAL Date: Wed, 3 Jul 2013 15:29:40 +1000 Message-ID: Subject: Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store To: "user@hbase.apache.org" Cc: dev@hbase.apache.org Content-Type: multipart/alternative; boundary=bcaec5016195f4337b04e094c1c2 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec5016195f4337b04e094c1c2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Dear Nick, Thanks a lot for the nice explanation. I just left yet with a small confusion: As you told: Your logical partitioning remains the same whether it's being served by N, 2N, or 3.5N+2 RegionServers. When a region is splitted into two, I guess the logical partitioning is changed, right? Could you kindly clarify a bit more. And my question is without making any change to my initial row-key-generation function, how new writes will go to the new regions? I assume it is hard to predict the number of RS initially as well as creating pre-splitted regions in very large-scale production systems. I am not worried about the default load-balancing behaviour of HBase. St.Ack and you've also clearly explained that as well. For an example: as indicated in the Lars George's book where he used <# of RS> while finding the prefix, I guess <# of Regions> could be also used (if regions are pre-splitted) ---------------------------------------------------------------------------= ------------------------------------------------- *Salting* You can use a salting prefix to the key that guarantees a spread of all rows across all region servers. For example: byte prefix =3D (byte) (Long.hashCode(timestamp) % ); byte[] rowkey =3D Bytes.add(Bytes.toBytes(prefix), Bytes.toBytes(timestamp)= ; This formula will generate enough prefix numbers to ensure that rows are sent to all region servers. Of course, the formula assumes a *specific*number of servers, and if you are planning to *grow your cluster* you should set this number to a * multiple* instead. ---------------------------------------------------------------------------= ------------------------------------------------- Thanks again ... =E2=80=8B Regards, Joarder Kamal On 3 July 2013 03:37, Nick Dimiduk wrote: > Hi Joarder, > > I think you're slightly confused about the impact of using a hashed (or > sometimes called "salted") prefix for your rowkeys. This strategy for > rowkey design has an impact on the logical ordering of your data, not > necessarily the physical distribution of your data. In HBase, these are > orthogonal concerns. It means that to execute a bucket-agnostic query, th= e > client must initiate N scans. However, there's no guarantee that all > regions starting with the same hash land on the same RegionServer. Region > assignment is a complex beast; as I understand, it's based on a randomish= , > load-based assignment. > > Take a look at your existing table distributed on a size-N cluster. Do al= l > regions that fall within the first bucket sit on the same RegionServer? > Likely not. However, look at the number of regions assigned to each > RegionServer. This should be close to even. Adding a new RegionServer to > the cluster will result in some of those regions migrating from the other > servers to the new one. The impact will be a decrease in the average numb= er > of regions served per RegionServer. > > Your logical partitioning remains the same whether it's being served by N= , > 2N, or 3.5N+2 RegionServers. Your client always needs to execute that > bucket-agnostic query as N scans, touching each of the N buckets. Precise= ly > which RegionServers are touched by any given scan depends entirely on how > the balancer has distributed load on your cluster. > > Thanks, > Nick > > On Thu, Jun 27, 2013 at 5:02 PM, Joarder KAMAL wrote= : > > > Thanks St.Ack for mentioning about the load-balancer. > > > > But my question was two folded: > > Case-1. If a new RS is added, then the load-balancer will do it's job > > considering no new region has been created in the meanwhile. // As you'= ve > > already answered. > > > > Case-2. Whether a new RS is added or not, an existing region is splitte= d > > into two, then how the new writes will to the new region? Because, lets > say > > initially the hash function was calculated with *N* Regions and now the= re > > are *N+1* Regions in the cluster. > > > > =E2=80=8BIn that case, do I need to change the Hash function and reshuf= fle all > the > > existing data within the cluster !! Or, HBase has some mechanism to > handle > > this?=E2=80=8B > > > > > > =E2=80=8BMany thanks again for helping me out...=E2=80=8B > > > > > > =E2=80=8B > > Regards, > > Joarder Kamal > > > > On 28 June 2013 02:26, Stack wrote: > > > > > On Wed, Jun 26, 2013 at 4:24 PM, Joarder KAMAL > > wrote: > > > > > > > May be a simple question to answer for the experienced HBase users > and > > > > developers: > > > > > > > > If I use hash partitioning to evenly distribute write workloads int= o > my > > > > region servers and later add a new region server to scale or split = an > > > > existing region, then do I need to change my hash function and > > re-shuffle > > > > all the existing data in between all the region servers (old and > new)? > > > Or, > > > > is there any better solution for this? Any guidance would be very > much > > > > helpful. > > > > > > > > > > You do not need to change your hash function. > > > > > > When you add a new regionserver, the balancer will move some of the > > > existing regions to the new host. > > > > > > St.Ack > > > > > > --bcaec5016195f4337b04e094c1c2--