hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Luo <j...@merkleinc.com>
Subject RE: Major Compaction Strategy
Date Fri, 29 Apr 2016 22:29:56 GMT

Will all your tables/regions be used 24/7, or at any time, just a part of regions used and
others are running ideal?

If latter, I developed a tool to launch major-compact in a "smart" way, because I am facing
a similar issue. https://github.com/jinyeluo/smarthbasecompactor.

It looks at every RegionServer, and find non-hot regions with most store files and starts
compacting. It just continue going until time is up. Just to be clear, it doesn't perform
MC itself, which is a scary thing to do, but tell region servers to do MC.

We have it running in our cluster for about 10 hours a day and it has virtually no impact
to applications and the cluster is doing far better than when using default scheduled MC.

-----Original Message-----
From: Saad Mufti [mailto:saad.mufti@gmail.com]
Sent: Friday, April 29, 2016 1:51 PM
To: user@hbase.apache.org
Subject: Re: Major Compaction Strategy

We have more issues now, after testing this in dev, in our production cluster which has tons
of data (60 regions servers and around 7000 regions), we tried to do rolling compaction and
most regions that were around 6-7 GB n size were taking 4-5 minutes to finish. Based on this
we estimated it would take something like 20 days for a single run to finish, which doesn't
seem reasonable.

So is it more reasonable to aim for doing major compaction across all region servers at once
but within a RS one region at a time? That would cut it down to around 8 hours which is still
very long. Or is it better to compact all regions on one region server, then move to the next?

The goal of all this is to maintain decent write performance while still doing compaction.
We don't have a good very low load period for our cluster so trying to find a way to do this
without cluster downtime.



On Wed, Apr 20, 2016 at 1:19 PM, Saad Mufti <saad.mufti@gmail.com> wrote:

> Thanks for the pointer. Working like a charm.
> ----
> Saad
> On Tue, Apr 19, 2016 at 4:01 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>> Please use the following method of HBaseAdmin:
>>   public CompactionState getCompactionStateForRegion(final byte[]
>> regionName)
>> Cheers
>> On Tue, Apr 19, 2016 at 12:56 PM, Saad Mufti <saad.mufti@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > We have a large HBase 1.x cluster in AWS and have disabled
>> > automatic
>> major
>> > compaction as advised. We were running our own code for compaction
>> > daily around midnight which calls
>> > HBaseAdmin.majorCompactRegion(byte[]
>> > regionName) in a rolling fashion across all regions.
>> >
>> > But we missed the fact that this is an asynchronous operation, so
>> > in practice this causes major compaction to run across all regions,
>> > at
>> least
>> > those not already major compacted (for example because previous
>> > minor compactions got upgraded to major ones).
>> >
>> > We don't really have a suitable low load period, so what is a
>> > suitable
>> way
>> > to make major compaction run in a rolling fashion region by region?
>> > The
>> API
>> > above provides no return value for us to be able to wait for one
>> compaction
>> > to finish before moving to the next.
>> >
>> > Thanks.
>> >
>> > ----
>> > Saad
>> >
This email and any attachments transmitted with it are intended for use by the intended recipient(s)
only. If you have received this email in error, please notify the sender immediately and then
delete it. If you are not the intended recipient, you must not keep, use, disclose, copy or
distribute this email without the author’s prior permission. We take precautions to minimize
the risk of transmitting software viruses, but we advise you to perform your own virus checks
on any attachment to this message. We cannot accept liability for any loss or damage caused
by software viruses. The information contained in this communication may be confidential and
may be subject to the attorney-client privilege.
View raw message