hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dejan Menges <dejan.men...@gmail.com>
Subject Re: Automating major compactions
Date Wed, 08 Jul 2015 17:30:48 GMT
Hi Behdad,

Thanks a lot, but this part I do already. My question was more what to use
to most intelligently (what exposed or not exposed metrics) figure out
where major compaction is needed the most.

Currently, choosing the region which has biggest number of store files +
the biggest amount of store files is doing the job, but wasn't sure if
there's maybe something better so far to choose from.

Cheers,
Dejan

On Wed, Jul 8, 2015 at 7:19 PM Behdad Forghani <behdad@exapackets.com>
wrote:

> To start major compaction for tablename from cli, you need to run:
> echo major_compact tablename | hbase shell
>
> I do this after bulk loading to the table.
>
> FYI, to avoid surprises, I also turn off load balancer and rebalance
> regions manually.
>
> The cli command to turn off balancer is:
> echo balance_switch false | hbase shell
>
> To rebalance regions after a bulk load or other changes, run:
> echo balance | hbase shell
>
> You  can run these two command using ssh. I use Ansible to do these.
> Assuming you have defined hbase_master in your hosts file, you can run:
> ansible -i hosts hbase_master -a "echo major_compact tablename | hbase
> shell"
>
> Behdad Forghani
>
> On Wed, Jul 8, 2015 at 8:03 AM, Dejan Menges <dejan.menges@gmail.com>
> wrote:
>
> > Hi,
> >
> > What's the best way to automate major compactions without enabling it
> > during off peak period?
> >
> > What I was testing is simple script which runs on every node in cluster,
> > checks if there is major compaction already running on that node, if not
> > picks one region for compaction and run compaction on that one region.
> >
> > It's running for some time and it helped us get our data to much better
> > shape, but now I'm not quite sure how to choose anymore which region to
> > compact. So far I was reading for that node rs-status#regionStoreStats
> and
> > first choosing the one with biggest amount of storefiles, and then those
> > with biggest storefile sizes.
> >
> > Is there maybe something more intelligent I could/should do?
> >
> > Thanks a lot!
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message