hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saad Mufti <saad.mu...@gmail.com>
Subject Re: Major Compaction Strategy
Date Fri, 29 Apr 2016 22:37:23 GMT
Unfortunately all our tables and regions are active 24/7.  Traffic does
fall some at night but there is no real downtime.

It is not user facing load though so we could I guess turn off traffic for
a while as data queues up in Kafka. But not too long as then we're playing
catch up.

----
Saad

On Friday, April 29, 2016, Frank Luo <jluo@merkleinc.com> wrote:

> Saad,
>
> Will all your tables/regions be used 24/7, or at any time, just a part of
> regions used and others are running ideal?
>
> If latter, I developed a tool to launch major-compact in a "smart" way,
> because I am facing a similar issue.
> https://github.com/jinyeluo/smarthbasecompactor.
>
> It looks at every RegionServer, and find non-hot regions with most store
> files and starts compacting. It just continue going until time is up. Just
> to be clear, it doesn't perform MC itself, which is a scary thing to do,
> but tell region servers to do MC.
>
> We have it running in our cluster for about 10 hours a day and it has
> virtually no impact to applications and the cluster is doing far better
> than when using default scheduled MC.
>
>
> -----Original Message-----
> From: Saad Mufti [mailto:saad.mufti@gmail.com <javascript:;>]
> Sent: Friday, April 29, 2016 1:51 PM
> To: user@hbase.apache.org <javascript:;>
> Subject: Re: Major Compaction Strategy
>
> We have more issues now, after testing this in dev, in our production
> cluster which has tons of data (60 regions servers and around 7000
> regions), we tried to do rolling compaction and most regions that were
> around 6-7 GB n size were taking 4-5 minutes to finish. Based on this we
> estimated it would take something like 20 days for a single run to finish,
> which doesn't seem reasonable.
>
> So is it more reasonable to aim for doing major compaction across all
> region servers at once but within a RS one region at a time? That would cut
> it down to around 8 hours which is still very long. Or is it better to
> compact all regions on one region server, then move to the next?
>
> The goal of all this is to maintain decent write performance while still
> doing compaction. We don't have a good very low load period for our cluster
> so trying to find a way to do this without cluster downtime.
>
> Thanks.
>
> ----
> Saad
>
>
> On Wed, Apr 20, 2016 at 1:19 PM, Saad Mufti <saad.mufti@gmail.com
> <javascript:;>> wrote:
>
> > Thanks for the pointer. Working like a charm.
> >
> > ----
> > Saad
> >
> >
> > On Tue, Apr 19, 2016 at 4:01 PM, Ted Yu <yuzhihong@gmail.com
> <javascript:;>> wrote:
> >
> >> Please use the following method of HBaseAdmin:
> >>
> >>   public CompactionState getCompactionStateForRegion(final byte[]
> >> regionName)
> >>
> >> Cheers
> >>
> >> On Tue, Apr 19, 2016 at 12:56 PM, Saad Mufti <saad.mufti@gmail.com
> <javascript:;>>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > We have a large HBase 1.x cluster in AWS and have disabled
> >> > automatic
> >> major
> >> > compaction as advised. We were running our own code for compaction
> >> > daily around midnight which calls
> >> > HBaseAdmin.majorCompactRegion(byte[]
> >> > regionName) in a rolling fashion across all regions.
> >> >
> >> > But we missed the fact that this is an asynchronous operation, so
> >> > in practice this causes major compaction to run across all regions,
> >> > at
> >> least
> >> > those not already major compacted (for example because previous
> >> > minor compactions got upgraded to major ones).
> >> >
> >> > We don't really have a suitable low load period, so what is a
> >> > suitable
> >> way
> >> > to make major compaction run in a rolling fashion region by region?
> >> > The
> >> API
> >> > above provides no return value for us to be able to wait for one
> >> compaction
> >> > to finish before moving to the next.
> >> >
> >> > Thanks.
> >> >
> >> > ----
> >> > Saad
> >> >
> >>
> >
> >
> This email and any attachments transmitted with it are intended for use by
> the intended recipient(s) only. If you have received this email in error,
> please notify the sender immediately and then delete it. If you are not the
> intended recipient, you must not keep, use, disclose, copy or distribute
> this email without the author’s prior permission. We take precautions to
> minimize the risk of transmitting software viruses, but we advise you to
> perform your own virus checks on any attachment to this message. We cannot
> accept liability for any loss or damage caused by software viruses. The
> information contained in this communication may be confidential and may be
> subject to the attorney-client privilege.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message