Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6D4671781D for ; Thu, 2 Apr 2015 12:42:29 +0000 (UTC) Received: (qmail 92540 invoked by uid 500); 2 Apr 2015 12:42:27 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 92464 invoked by uid 500); 2 Apr 2015 12:42:27 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 92453 invoked by uid 99); 2 Apr 2015 12:42:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Apr 2015 12:42:26 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of kevin.odell@cloudera.com designates 209.85.213.171 as permitted sender) Received: from [209.85.213.171] (HELO mail-ig0-f171.google.com) (209.85.213.171) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Apr 2015 12:42:22 +0000 Received: by igcau2 with SMTP id au2so48604615igc.1 for ; Thu, 02 Apr 2015 05:41:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=52MCPYt6w4CG0BqToNJq0wCBrMXK/2oUZJ8cD3p1j58=; b=ecpjRjXbiOwWBpiHoPYHbWk/WKmgYGSfqcckDoN7yW4ZKaTybLWswhXqZp9xqb/LV4 Kyo6s9693qCO97Kt565X1xPXt5Wn2Y053MW77GxON9LQxKEQBQjzBzD0UWdx2mYZktG2 MiAuVtKOlkjgIoQSQsm3tXPSHNG9erVb0KTsa9m/KOw3zRlqbGHWqrU8sUifUmwHtl+2 0Y5YNNtwBvg2FSB7HCZoLngt57SkKHjsak5Z06Q+bMmoqOuR5oH5LWQo8PiJblvldmwb YZdjpgwWlQ6VzV3FT+y7rbiBIsMugg0TAso5WpudwdxSZ9EK/NDeTimELWLOSyHmeo2s hRYg== X-Gm-Message-State: ALoCoQkj7F+LiQ97abeRxtf68FesXoLj9H4WmQpPNeFJQXTgmIu9TRsur5Qs5hv1HRPzdQQGyMV+ MIME-Version: 1.0 X-Received: by 10.107.46.27 with SMTP id i27mr28927957ioo.5.1427978476689; Thu, 02 Apr 2015 05:41:16 -0700 (PDT) Received: by 10.42.160.201 with HTTP; Thu, 2 Apr 2015 05:41:16 -0700 (PDT) In-Reply-To: References: <739050389.649927.1427082995075.JavaMail.yahoo@mail.yahoo.com> Date: Thu, 2 Apr 2015 08:41:16 -0400 Message-ID: Subject: Re: introducing nodes w/ more storage From: "Kevin O'dell" To: "user@hbase.apache.org" Cc: lars hofhansl Content-Type: multipart/alternative; boundary=001a11378d2ed9bfff0512bd259d X-Virus-Checked: Checked by ClamAV on apache.org --001a11378d2ed9bfff0512bd259d Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Mike, Sorry for the delay here. How does the HDFS load balancer impact the load balancing of HBase? <-- The HDFS load balancer is not automatically run, it is a manual process that is kicked off. It is not recommended to *ever run the HDFS balancer on a cluster running HBase. Similar to have HBase has no concept or care about the underlying storage, HDFS has no concept or care of the region layout, nor the locality we worked so hard to build through compactions. Furthermore, once the HDFS balancer has saved us from running out of space on the smaller nodes, we will run a major compaction, and re-write all of the HBase data right back to where it was before. one is the number of regions managed by a region server that=E2=80=99s HBas= e=E2=80=99s load, right? And then there=E2=80=99s the data distribution of HBase files = that is really managed by HDFS load balancer, right? <--- Right, until we run major compaction and "restore" locality by moving the data back Even still=E2=80=A6 eventually the data will be distributed equally across = the cluster. What=E2=80=99s happening with the HDFS balancer? Is that heteroge= nous or homogenous in terms of storage? <-- Not quite, as I said before the HDFS balancer is manual, so it is quite easy to build up a skew, especially if you use a datanode as an edge node or thrift gateway etc. Yes, the HDFS balancer is heterogenous, but it doesn't play nice with HBase. *The use of the word ever should not be construed as a true definitive. Ever is being used to represent a best practice. In many cases the HDFS balancer needs to be run, especially in multi-tenant clusters with archive data. It is best to immediately run a major compaction to restore HBase locality if the HDFS balancer is used. On Mon, Mar 23, 2015 at 10:50 AM, Michael Segel wrote: > @lars, > > How does the HDFS load balancer impact the load balancing of HBase? > > Of course there are two loads=E2=80=A6 one is the number of regions manag= ed by a > region server that=E2=80=99s HBase=E2=80=99s load, right? > And then there=E2=80=99s the data distribution of HBase files that is rea= lly > managed by HDFS load balancer, right? > > OP=E2=80=99s question is having a heterogenous cluster where he would lik= e to see > a more even distribution of data/free space based on the capacity of the > newer machines in the cluster. > > This is a storage question, not a memory/cpu core question. > > Or am I missing something? > > > -Mike > > > On Mar 22, 2015, at 10:56 PM, lars hofhansl wrote: > > > > Seems that it should not be too hard to add that to the stochastic load > balancer. > > We could add a spaceCost or something. > > > > > > > > ----- Original Message ----- > > From: Jean-Marc Spaggiari > > To: user > > Cc: Development > > Sent: Thursday, March 19, 2015 12:55 PM > > Subject: Re: introducing nodes w/ more storage > > > > You can extend the default balancer and assign the regions based on > > that.But at the end, the replicated blocks might still go all over the > > cluster and your "small" nodes are going to be full and will not be abl= e > to > > get anymore writes even for the regions they are supposed to get. > > > > I'm not sure there is a good solution for what you are looking for :( > > > > I build my own balancer but because of differences in the CPUs, not > because > > of differences of the storage space... > > > > > > 2015-03-19 15:50 GMT-04:00 Nick Dimiduk : > > > >> Seems more fantasy than fact, I'm afraid. The default load balancer [0= ] > >> takes store file size into account, but has no concept of capacity. It > >> doesn't know that nodes in a heterogenous environment have different > >> capacity. > >> > >> This would be a good feature to add though. > >> > >> [0]: > >> > >> > https://github.com/apache/hbase/blob/branch-1.0/hbase-server/src/main/jav= a/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java > >> > >> On Tue, Mar 17, 2015 at 7:26 AM, Ted Tuttle > wrote: > >> > >>> Hello- > >>> > >>> Sometime back I asked a question about introducing new nodes w/ more > >>> storage that existing nodes. I was told at the time that HBase will > not > >> be > >>> able to utilize the additional storage; I assumed at the time that > >> regions > >>> are allocated to nodes in something like a round-robin fashion and th= e > >> node > >>> with the least storage sets the limit for how much each node can > utilize. > >>> > >>> My question this time around has to do with nodes w/ unequal numbers = of > >>> volumes: Does HBase allocate regions based on nodes or volumes on the > >>> nodes? I am hoping I can add a node with 8 volumes totaling 8X TB an= d > >> all > >>> the volumes will be filled. This even though legacy nodes have 5 > volumes > >>> and total storage of 5X TB. > >>> > >>> Fact or fantasy? > >>> > >>> Thanks, > >>> Ted > >>> > >>> > >> > > > > The opinions expressed here are mine, while they may reflect a cognitive > thought, that is purely accidental. > Use at your own risk. > Michael Segel > michael_segel (AT) hotmail.com > > > > > > --=20 Kevin O'Dell Field Enablement, Cloudera --001a11378d2ed9bfff0512bd259d--