hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Luo <j...@merkleinc.com>
Subject RE: Major Compaction Strategy
Date Fri, 29 Apr 2016 22:35:42 GMT
Try to get code from dev branch.

The master and rel_1.1 is on 0.98.

-----Original Message-----
From: Ted Yu [mailto:yuzhihong@gmail.com]
Sent: Friday, April 29, 2016 5:34 PM
To: user@hbase.apache.org
Subject: Re: Major Compaction Strategy

Interesting.

When compiling against hbase 1.1.2, I got:

http://pastebin.com/NfUjva9R

FYI

On Fri, Apr 29, 2016 at 3:29 PM, Frank Luo <jluo@merkleinc.com> wrote:

> Saad,
>
> Will all your tables/regions be used 24/7, or at any time, just a part
> of regions used and others are running ideal?
>
> If latter, I developed a tool to launch major-compact in a "smart"
> way, because I am facing a similar issue.
> https://github.com/jinyeluo/smarthbasecompactor.
>
> It looks at every RegionServer, and find non-hot regions with most
> store files and starts compacting. It just continue going until time
> is up. Just to be clear, it doesn't perform MC itself, which is a
> scary thing to do, but tell region servers to do MC.
>
> We have it running in our cluster for about 10 hours a day and it has
> virtually no impact to applications and the cluster is doing far
> better than when using default scheduled MC.
>
>
> -----Original Message-----
> From: Saad Mufti [mailto:saad.mufti@gmail.com]
> Sent: Friday, April 29, 2016 1:51 PM
> To: user@hbase.apache.org
> Subject: Re: Major Compaction Strategy
>
> We have more issues now, after testing this in dev, in our production
> cluster which has tons of data (60 regions servers and around 7000
> regions), we tried to do rolling compaction and most regions that were
> around 6-7 GB n size were taking 4-5 minutes to finish. Based on this
> we estimated it would take something like 20 days for a single run to
> finish, which doesn't seem reasonable.
>
> So is it more reasonable to aim for doing major compaction across all
> region servers at once but within a RS one region at a time? That
> would cut it down to around 8 hours which is still very long. Or is it
> better to compact all regions on one region server, then move to the next?
>
> The goal of all this is to maintain decent write performance while
> still doing compaction. We don't have a good very low load period for
> our cluster so trying to find a way to do this without cluster downtime.
>
> Thanks.
>
> ----
> Saad
>
>
> On Wed, Apr 20, 2016 at 1:19 PM, Saad Mufti <saad.mufti@gmail.com> wrote:
>
> > Thanks for the pointer. Working like a charm.
> >
> > ----
> > Saad
> >
> >
> > On Tue, Apr 19, 2016 at 4:01 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> >> Please use the following method of HBaseAdmin:
> >>
> >>   public CompactionState getCompactionStateForRegion(final byte[]
> >> regionName)
> >>
> >> Cheers
> >>
> >> On Tue, Apr 19, 2016 at 12:56 PM, Saad Mufti <saad.mufti@gmail.com>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > We have a large HBase 1.x cluster in AWS and have disabled
> >> > automatic
> >> major
> >> > compaction as advised. We were running our own code for
> >> > compaction daily around midnight which calls
> >> > HBaseAdmin.majorCompactRegion(byte[]
> >> > regionName) in a rolling fashion across all regions.
> >> >
> >> > But we missed the fact that this is an asynchronous operation, so
> >> > in practice this causes major compaction to run across all
> >> > regions, at
> >> least
> >> > those not already major compacted (for example because previous
> >> > minor compactions got upgraded to major ones).
> >> >
> >> > We don't really have a suitable low load period, so what is a
> >> > suitable
> >> way
> >> > to make major compaction run in a rolling fashion region by region?
> >> > The
> >> API
> >> > above provides no return value for us to be able to wait for one
> >> compaction
> >> > to finish before moving to the next.
> >> >
> >> > Thanks.
> >> >
> >> > ----
> >> > Saad
> >> >
> >>
> >
> >
> This email and any attachments transmitted with it are intended for
> use by the intended recipient(s) only. If you have received this email
> in error, please notify the sender immediately and then delete it. If
> you are not the intended recipient, you must not keep, use, disclose,
> copy or distribute this email without the author’s prior permission.
> We take precautions to minimize the risk of transmitting software
> viruses, but we advise you to perform your own virus checks on any
> attachment to this message. We cannot accept liability for any loss or
> damage caused by software viruses. The information contained in this
> communication may be confidential and may be subject to the attorney-client privilege.
>
This email and any attachments transmitted with it are intended for use by the intended recipient(s)
only. If you have received this email in error, please notify the sender immediately and then
delete it. If you are not the intended recipient, you must not keep, use, disclose, copy or
distribute this email without the author’s prior permission. We take precautions to minimize
the risk of transmitting software viruses, but we advise you to perform your own virus checks
on any attachment to this message. We cannot accept liability for any loss or damage caused
by software viruses. The information contained in this communication may be confidential and
may be subject to the attorney-client privilege.
Mime
View raw message