Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAAg3a2qtvv+pnKuGj0TB6sCBDANBvqsjpq0CD2mTH+VJbfBmzQ@mail.gmail.com>
References: 
 <CAEf6Z5+8Qg77Ot0Aas8yNnC+akKCm5K=-QozLxYJvEC=_O3dWg@mail.gmail.com>
 <CA+CwGSm0vcjWo9okcBd=VkJ=3x8sYOMRoPNHUMMovHteadKOkg@mail.gmail.com>
 <CAEf6Z5KF+x3zqC5Z1UFegF5OxETfawUeMZmuA4-LKa1mr3CJUQ@mail.gmail.com>
 <CAHxLZBWMLhyK0nZkJ9pBSM0B+APd_1q1OFBBeLdEtgMhP6_u5w@mail.gmail.com>
 <CAEf6Z5+ir-Py8Oi9B54LDYdKHXAeV17yc0tRZv7cLnS39rmk1w@mail.gmail.com>
 <CAAg3a2qtvv+pnKuGj0TB6sCBDANBvqsjpq0CD2mTH+VJbfBmzQ@mail.gmail.com>
From: Jean-Marc Spaggiari <jean-marc@spaggiari.org>
Date: Wed, 8 Jul 2015 16:48:20 -0400
Message-ID: 
 <CAPQV63VNZz5kNqTjunnVyKZLoTb5Gg09JnGcQhDEVQCoBV-ZpA@mail.gmail.com>
Subject: Re: Automating major compactions
To: user <user@hbase.apache.org>
Content-Type: multipart/alternative; boundary=001a1135d5f680cd75051a634309

--001a1135d5f680cd75051a634309
Content-Type: text/plain; charset=UTF-8

Just missing the ColumnFamiliy at the end of the path. Your memory is
pretty good.

JM

2015-07-08 16:39 GMT-04:00 Vladimir Rodionov <vladrodionov@gmail.com>:

> You can find this info yourself, Dejan
>
> 1. Locate table dir on HDFS
> 2. List all regions (directories)
> 3. Iterate files in each directory and find the oldest one (creation time)
> 4. The region with the oldest file is your candidate for major compaction
>
> /HBASE_ROOT/data/namespace/table/region (If my memory serves me right :))
>
> -Vlad
>
> On Wed, Jul 8, 2015 at 1:07 PM, Dejan Menges <dejan.menges@gmail.com>
> wrote:
>
> > Hi Mikhail,
> >
> > Actually, reason is quite stupid on my side - to avoid compacting one
> > region over and over again while others are waiting in line (reading HTML
> > and sorting only on number of store files gets you at some point having
> > bunch of regions having exactly the same number of store files).
> >
> > Thanks for this hint - this is exactly something I was looking for. Was
> > trying previously to figure out if it's possible to query meta for this
> > information (using currently 0.98.0, 0.98.4 and waiting for HDP 2.3 from
> > Hortonworks to upgrade immediately) but for our current version didn't
> > found that possible, that's why I decided going this way.
> >
> > On Wed, Jul 8, 2015 at 10:02 PM Mikhail Antonov <olorinbant@gmail.com>
> > wrote:
> >
> > > I totally understand the reasoning behind compacting regions with
> > > biggest number of store files, but didn't follow why it's best to
> > > compact regions which have biggest store files, maybe I'm missing
> > > something? I'd maybe compact regions which have the smallest avg
> > > storefile size?
> > >
> > > You may also want to take a look at
> > > https://issues.apache.org/jira/browse/HBASE-12859, and compact regions
> > > for which MC was last run longer time ago.
> > >
> > > -Mikhail
> > >
> > > On Wed, Jul 8, 2015 at 10:30 AM, Dejan Menges <dejan.menges@gmail.com>
> > > wrote:
> > > > Hi Behdad,
> > > >
> > > > Thanks a lot, but this part I do already. My question was more what
> to
> > > use
> > > > to most intelligently (what exposed or not exposed metrics) figure
> out
> > > > where major compaction is needed the most.
> > > >
> > > > Currently, choosing the region which has biggest number of store
> files
> > +
> > > > the biggest amount of store files is doing the job, but wasn't sure
> if
> > > > there's maybe something better so far to choose from.
> > > >
> > > > Cheers,
> > > > Dejan
> > > >
> > > > On Wed, Jul 8, 2015 at 7:19 PM Behdad Forghani <
> behdad@exapackets.com>
> > > > wrote:
> > > >
> > > >> To start major compaction for tablename from cli, you need to run:
> > > >> echo major_compact tablename | hbase shell
> > > >>
> > > >> I do this after bulk loading to the table.
> > > >>
> > > >> FYI, to avoid surprises, I also turn off load balancer and rebalance
> > > >> regions manually.
> > > >>
> > > >> The cli command to turn off balancer is:
> > > >> echo balance_switch false | hbase shell
> > > >>
> > > >> To rebalance regions after a bulk load or other changes, run:
> > > >> echo balance | hbase shell
> > > >>
> > > >> You  can run these two command using ssh. I use Ansible to do these.
> > > >> Assuming you have defined hbase_master in your hosts file, you can
> > run:
> > > >> ansible -i hosts hbase_master -a "echo major_compact tablename |
> hbase
> > > >> shell"
> > > >>
> > > >> Behdad Forghani
> > > >>
> > > >> On Wed, Jul 8, 2015 at 8:03 AM, Dejan Menges <
> dejan.menges@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Hi,
> > > >> >
> > > >> > What's the best way to automate major compactions without enabling
> > it
> > > >> > during off peak period?
> > > >> >
> > > >> > What I was testing is simple script which runs on every node in
> > > cluster,
> > > >> > checks if there is major compaction already running on that node,
> if
> > > not
> > > >> > picks one region for compaction and run compaction on that one
> > region.
> > > >> >
> > > >> > It's running for some time and it helped us get our data to much
> > > better
> > > >> > shape, but now I'm not quite sure how to choose anymore which
> region
> > > to
> > > >> > compact. So far I was reading for that node
> > rs-status#regionStoreStats
> > > >> and
> > > >> > first choosing the one with biggest amount of storefiles, and then
> > > those
> > > >> > with biggest storefile sizes.
> > > >> >
> > > >> > Is there maybe something more intelligent I could/should do?
> > > >> >
> > > >> > Thanks a lot!
> > > >> >
> > > >>
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Michael Antonov
> > >
> >
>

--001a1135d5f680cd75051a634309--