Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAEf6Z5+8Qg77Ot0Aas8yNnC+akKCm5K=-QozLxYJvEC=_O3dWg@mail.gmail.com>
References: 
 <CAEf6Z5+8Qg77Ot0Aas8yNnC+akKCm5K=-QozLxYJvEC=_O3dWg@mail.gmail.com>
Date: Wed, 8 Jul 2015 12:18:43 -0500
Message-ID: 
 <CA+CwGSm0vcjWo9okcBd=VkJ=3x8sYOMRoPNHUMMovHteadKOkg@mail.gmail.com>
Subject: Re: Automating major compactions
From: Behdad Forghani <behdad@exapackets.com>
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=bcaec5196a6fb21fa2051a6054f8

--bcaec5196a6fb21fa2051a6054f8
Content-Type: text/plain; charset=UTF-8

To start major compaction for tablename from cli, you need to run:
echo major_compact tablename | hbase shell

I do this after bulk loading to the table.

FYI, to avoid surprises, I also turn off load balancer and rebalance
regions manually.

The cli command to turn off balancer is:
echo balance_switch false | hbase shell

To rebalance regions after a bulk load or other changes, run:
echo balance | hbase shell

You  can run these two command using ssh. I use Ansible to do these.
Assuming you have defined hbase_master in your hosts file, you can run:
ansible -i hosts hbase_master -a "echo major_compact tablename | hbase
shell"

Behdad Forghani

On Wed, Jul 8, 2015 at 8:03 AM, Dejan Menges <dejan.menges@gmail.com> wrote:

> Hi,
>
> What's the best way to automate major compactions without enabling it
> during off peak period?
>
> What I was testing is simple script which runs on every node in cluster,
> checks if there is major compaction already running on that node, if not
> picks one region for compaction and run compaction on that one region.
>
> It's running for some time and it helped us get our data to much better
> shape, but now I'm not quite sure how to choose anymore which region to
> compact. So far I was reading for that node rs-status#regionStoreStats and
> first choosing the one with biggest amount of storefiles, and then those
> with biggest storefile sizes.
>
> Is there maybe something more intelligent I could/should do?
>
> Thanks a lot!
>

--bcaec5196a6fb21fa2051a6054f8--