hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Compactions nice to have features
Date Sun, 05 Oct 2014 22:01:58 GMT
>>- rack IO throttle. We should add that to accommodate for over subscription at the
ToR level.
> Can you decipher that, Lars?

ToR is "Top of Rack" switch. Over subscription means that a ToR switch usually does not have
enough bandwidth to serve traffic in and out of rack at full speed.
For example if you had 40 machines in a rack with 1ge links each, and the ToR switch has a
10ge uplink, you'd say the ToR switch is 4 to 1 over subsctribed.


Was just trying to say: "Yeah, we need that" :)

>>- cluster wide compaction storms. Yeah, that's bad. Can be alleviated by spreading
timed major compactions out. (in our clusters we set the 
interval to 1 week and the jitter to 1/2 week)
> I think we have some JIRAs for that? 

That you can already do:
hbase.hregion.majorcompaction defaults to day in 0.94 (86400000ms). Means *all* data is rewritten
*every single* day. We set it to 604800000ms (1 week)
hbase.hregion.majorcompaction.jitter defaults to 20% (0.2). We set this to 0.5 (so we spread
out the timed major compactions over 1 week, to avoid storms)

Just checked 0.98. Turns out these are exactly the defaults there (1 week +- 1/2 week). Cool,
forgot about that (see HBASE-8450). So in 0.98+ the defaults should be pretty good on this
end.

Compaction storms can still happen during normal load when write is equally spread out of
very many regions. I that case it's not unlikely that many regions decide to compact at the
same time.


-- Lars



________________________________
 From: Vladimir Rodionov <vladrodionov@gmail.com>
To: "dev@hbase.apache.org" <dev@hbase.apache.org>; lars hofhansl <larsh@apache.org>

Sent: Sunday, October 5, 2014 2:11 PM
Subject: Re: Compactions nice to have features
 



>> A few comments:
>> - bulkload - you mean not by loading pre-created HFiles? If you do that 
there would be no compaction during the import as the files are simply 
moved 
>> into place.

Bulk load is not always convenient or feasible, we have batched mutations support in API but
still compaction is serious issue. Cassandra allows to disable/enable compactions (I think
its cluster-wide, not sure though), why do should not we have?

>>- local compaction IO limit. Limiting the number of compaction threads 
(1 by default) is not good enough ... ? You can cause too much harm even
 with a >> single thread compacting per region server?

This is I am not sure about myself. The idea is to make compaction more I/O nicer. For example,
read operations and memstore flushes  should have higher priority than compaction I/O. One
way is to limit (throttle) compaction bandwidth locally, there are some other approaches as
well.


>>- rack IO throttle. We should add that to accommodate for over subscription at the
ToR level.

Can you decipher that, Lars?

>>- cluster wide compaction storms. Yeah, that's bad. Can be alleviated by
 spreading timed major compactions out. (in our clusters we set the 
interval to 1 week and the jitter to 1/2 week)


I think we have some JIRAs for that? 


>>- what do you think about off-peak compaction? We have that in part as 
the compaction ratio can be set differently for off peak hours


Off peak compactions can have higher limits or even different policies.


>>Generally I like the idea of being able to pace compaction better.
>>Do you want to file jiras for these? 

Yeah, will do that. 





On Sat, Oct 4, 2014 at 10:31 AM, lars hofhansl <larsh@apache.org> wrote:

Hi Vladimir,
>
>these are very interesting.
>A few comments:
>- bulkload - you mean not by loading pre-created HFiles? If you do that there would be
no compaction during the import as the files are simply moved into place.
>- local compaction IO limit. Limiting the number of compaction threads (1 by default)
is not good enough ... ? You can cause too much harm even with a single thread compacting
per region server?
>
>- rack IO throttle. We should add that to accommodate for over subscription at the ToR
level.
>- cluster wide compaction storms. Yeah, that's bad. Can be alleviated by spreading timed
major compactions out. (in our clusters we set the interval to 1 week and the jitter to 1/2
week)
>- what do you think about off-peak compaction? We have that in part as the compaction
ratio can be set differently for off peak hours
>
>
>Generally I like the idea of being able to pace compaction better.
>Do you want to file jiras for these? Doesn't mean you have to do all the work :)
>
>
>-- Lars
>
>
>
>________________________________
> From: Vladimir Rodionov <vladrodionov@gmail.com>
>To: "dev@hbase.apache.org" <dev@hbase.apache.org>
>Sent: Friday, October 3, 2014 10:34 PM
>Subject: Compactions nice to have features
>
>
>
>I am thinking about the following:
>
>1. Compaction On/Off per CF, Table, cluster. Both: minor and major
>
>Good during bulk load.
>
>- Disable compaction for table 'T'
>- Load 1B rows
>- Enable compaction for table 'T'
>
>2. Local Compaction I/O throttle
>
>Set I/O limit per RS
>
>3. Rack Compaction I/O throttle
>
>Set I/O limit per server rack. Good to control uplink bandwidth.
>
>4. Cluster Compaction I/O throttle. Good to avoid compaction storms
>
>-Vladimir Rodionov
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message