hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14383) Compaction improvements
Date Thu, 17 Sep 2015 23:04:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804682#comment-14804682
] 

Enis Soztutar commented on HBASE-14383:
---------------------------------------

bq. flush policy ignores all files less than 15MB.
Where is this code? I could not find anything in the periodic or non-periodic flush requests
that prevents flush requests. 
bq. maxlogs is really a function of heap available for the memstores and the HDFS block size
used. Something like: maxlogs = memstore heap / (HDFS blocksize * 0.95)
This assumes that all memstores are getting updates. In case a memstore stops getting updates,
it will not flush for ~0.5 hour (expected) unless it is the biggest memstore left. 
bq. Can we just default it to that? Maybe with 10% padding.
Maybe we can instead do the limit as 2x or 3x. 

> Compaction improvements
> -----------------------
>
>                 Key: HBASE-14383
>                 URL: https://issues.apache.org/jira/browse/HBASE-14383
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>             Fix For: 2.0.0
>
>
> Still major issue in many production environments. The general recommendation - disabling
region splitting and major compactions to reduce unpredictable IO/CPU spikes, especially during
peak times and running them manually during off peak times. Still do not resolve the issues
completely.
> h3. Flush storms
> * rolling WAL events across cluster can be highly correlated, hence flushing memstores,
hence triggering minor compactions, that can be promoted to major ones. These events are highly
correlated in time if there is a balanced write-load on the regions in a table.
> *  the same is true for memstore flushing due to periodic memstore flusher operation.

> Both above may produce *flush storms* which are as bad as *compaction storms*. 
> What can be done here. We can spread these events over time by randomizing (with jitter)
several  config options:
> # hbase.regionserver.optionalcacheflushinterval
> # hbase.regionserver.flush.per.changes
> # hbase.regionserver.maxlogs   
> h3. ExploringCompactionPolicy max compaction size
> One more optimization can be added to ExploringCompactionPolicy. To limit size of a compaction
there is a config parameter one could use hbase.hstore.compaction.max.size. It would be nice
to have two separate limits: for peak and off peak hours.
> h3. ExploringCompactionPolicy selection evaluation algorithm
> Too simple? Selection with more files always wins, selection of smaller size wins if
number of files is the same. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message