Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 65B2A18D85 for ; Thu, 18 Jun 2015 16:39:48 +0000 (UTC) Received: (qmail 38497 invoked by uid 500); 18 Jun 2015 16:39:46 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 38421 invoked by uid 500); 18 Jun 2015 16:39:46 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 38410 invoked by uid 99); 18 Jun 2015 16:39:46 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Jun 2015 16:39:46 +0000 Received: from mail-yk0-f175.google.com (mail-yk0-f175.google.com [209.85.160.175]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id EF21E1A0046 for ; Thu, 18 Jun 2015 16:39:45 +0000 (UTC) Received: by ykfl8 with SMTP id l8so70816182ykf.1 for ; Thu, 18 Jun 2015 09:39:45 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.52.163.243 with SMTP id yl19mr9038268vdb.51.1434645585054; Thu, 18 Jun 2015 09:39:45 -0700 (PDT) Received: by 10.31.80.7 with HTTP; Thu, 18 Jun 2015 09:39:44 -0700 (PDT) In-Reply-To: References: Date: Thu, 18 Jun 2015 09:39:44 -0700 Message-ID: Subject: Re: Stochastic Balancer by tables From: Elliott Clark To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a11c256de79d5510518cd74e0 --001a11c256de79d5510518cd74e0 Content-Type: text/plain; charset=UTF-8 The balancer is not responsible fore region size decisions. The balancer is only responsible for deciding which regionservers should host which regions. Splits are determined by data size of a region. See max store file size. On Thu, Jun 18, 2015 at 7:50 AM, Nasron Cheong wrote: > Hi, > > I've noticed there are two settings available when using the HBase balancer > (specifically the default stochastic balancer) > > hbase.master.balancer.stochastic.tableSkewCost > > hbase.master.loadbalance.bytable > > How do these two settings relate? The documentation indicates when using > the stochastic balancer that 'bytable' should be set to false? > > Our deployment relies on very few, very large tables, and I've noticed bad > distribution when accessing some of the tables. E.g. there are 443 regions > for a single table, but when doing a MR job over a full scan of the table, > the first 426 regions scan quickly (minutes), but the remaining 17 regions > take significantly longer (hours) > > My expectation is to have the balancer equalize the size of the regions for > each table. > > Thanks! > > - Nasron > --001a11c256de79d5510518cd74e0--