hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dejan Menges <dejan.men...@gmail.com>
Subject Re: Major Compaction Strategy
Date Fri, 29 Apr 2016 19:11:27 GMT
What I scripted quite easily - we are hitting one major compaction per
table at a time (saying also we have much bigger regions than that, and it
takes couple of hours per region to compact). We made simple script which
is finding region to compact based on some parameters we define (mostly we
pick the region with oldest files inside) and we don't start compacting it
if there's major compaction already running on that table.

Another thing we were doing was also to find a region(s) which made most
sense to compact (biggest number of files to compact) and then we were
testing with compacting one or multiple regions per table per run. There
was also lot of testing here to figure out what won't affect our
performance.

On Fri, Apr 29, 2016 at 8:51 PM Saad Mufti <saad.mufti@gmail.com> wrote:

> We have more issues now, after testing this in dev, in our production
> cluster which has tons of data (60 regions servers and around 7000
> regions), we tried to do rolling compaction and most regions that were
> around 6-7 GB n size were taking 4-5 minutes to finish. Based on this we
> estimated it would take something like 20 days for a single run to finish,
> which doesn't seem reasonable.
>
> So is it more reasonable to aim for doing major compaction across all
> region servers at once but within a RS one region at a time? That would cut
> it down to around 8 hours which is still very long. Or is it better to
> compact all regions on one region server, then move to the next?
>
> The goal of all this is to maintain decent write performance while still
> doing compaction. We don't have a good very low load period for our cluster
> so trying to find a way to do this without cluster downtime.
>
> Thanks.
>
> ----
> Saad
>
>
> On Wed, Apr 20, 2016 at 1:19 PM, Saad Mufti <saad.mufti@gmail.com> wrote:
>
> > Thanks for the pointer. Working like a charm.
> >
> > ----
> > Saad
> >
> >
> > On Tue, Apr 19, 2016 at 4:01 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> >> Please use the following method of HBaseAdmin:
> >>
> >>   public CompactionState getCompactionStateForRegion(final byte[]
> >> regionName)
> >>
> >> Cheers
> >>
> >> On Tue, Apr 19, 2016 at 12:56 PM, Saad Mufti <saad.mufti@gmail.com>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > We have a large HBase 1.x cluster in AWS and have disabled automatic
> >> major
> >> > compaction as advised. We were running our own code for compaction
> daily
> >> > around midnight which calls HBaseAdmin.majorCompactRegion(byte[]
> >> > regionName) in a rolling fashion across all regions.
> >> >
> >> > But we missed the fact that this is an asynchronous operation, so in
> >> > practice this causes major compaction to run across all regions, at
> >> least
> >> > those not already major compacted (for example because previous minor
> >> > compactions got upgraded to major ones).
> >> >
> >> > We don't really have a suitable low load period, so what is a suitable
> >> way
> >> > to make major compaction run in a rolling fashion region by region?
> The
> >> API
> >> > above provides no return value for us to be able to wait for one
> >> compaction
> >> > to finish before moving to the next.
> >> >
> >> > Thanks.
> >> >
> >> > ----
> >> > Saad
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message