hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saad Mufti <saad.mu...@gmail.com>
Subject Re: Major Compaction Strategy
Date Fri, 29 Apr 2016 18:51:13 GMT
We have more issues now, after testing this in dev, in our production
cluster which has tons of data (60 regions servers and around 7000
regions), we tried to do rolling compaction and most regions that were
around 6-7 GB n size were taking 4-5 minutes to finish. Based on this we
estimated it would take something like 20 days for a single run to finish,
which doesn't seem reasonable.

So is it more reasonable to aim for doing major compaction across all
region servers at once but within a RS one region at a time? That would cut
it down to around 8 hours which is still very long. Or is it better to
compact all regions on one region server, then move to the next?

The goal of all this is to maintain decent write performance while still
doing compaction. We don't have a good very low load period for our cluster
so trying to find a way to do this without cluster downtime.

Thanks.

----
Saad


On Wed, Apr 20, 2016 at 1:19 PM, Saad Mufti <saad.mufti@gmail.com> wrote:

> Thanks for the pointer. Working like a charm.
>
> ----
> Saad
>
>
> On Tue, Apr 19, 2016 at 4:01 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> Please use the following method of HBaseAdmin:
>>
>>   public CompactionState getCompactionStateForRegion(final byte[]
>> regionName)
>>
>> Cheers
>>
>> On Tue, Apr 19, 2016 at 12:56 PM, Saad Mufti <saad.mufti@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > We have a large HBase 1.x cluster in AWS and have disabled automatic
>> major
>> > compaction as advised. We were running our own code for compaction daily
>> > around midnight which calls HBaseAdmin.majorCompactRegion(byte[]
>> > regionName) in a rolling fashion across all regions.
>> >
>> > But we missed the fact that this is an asynchronous operation, so in
>> > practice this causes major compaction to run across all regions, at
>> least
>> > those not already major compacted (for example because previous minor
>> > compactions got upgraded to major ones).
>> >
>> > We don't really have a suitable low load period, so what is a suitable
>> way
>> > to make major compaction run in a rolling fashion region by region? The
>> API
>> > above provides no return value for us to be able to wait for one
>> compaction
>> > to finish before moving to the next.
>> >
>> > Thanks.
>> >
>> > ----
>> > Saad
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message