hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Rodionov (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-14951) Make hbase.regionserver.maxlogs obsolete
Date Tue, 22 Dec 2015 20:49:46 GMT

     [ https://issues.apache.org/jira/browse/HBASE-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vladimir Rodionov updated HBASE-14951:
--------------------------------------
    Release Note: 
Rolling WAL events across a cluster can be highly correlated, hence flushing memstores, hence
triggering minor compactions, that can be promoted to major ones. These events are highly
correlated in time if there is a balanced write-load on the regions in a table. Default value
for maximum WAL files (* hbase.regionserver.maxlogs*), which controls WAL rolling events -
32 is too small for many modern deployments. 
Now we calculate this value dynamically (if not defined by user), using the following formula:

maxLogs = Math.max( 32, HBASE_HEAP_SIZE * memstoreRatio * 2/ LogRollSize), where

memstoreRatio is *hbase.regionserver.global.memstore.size*
LogRollSize is maximum WAL file size (default 0.95 * HDFS block size)

We need to make sure that we avoid fully or minimize events when RS has to flush memstores
prematurely only because it reached artificial limit of hbase.regionserver.maxlogs, this is
why we put this 2 x multiplier in equation, this gives us maximum WAL capacity of 2 x RS memstore-size.


Runaway WAL files.

The default log rolling period (1h) allows to accumulate up to 2 X Memstore Size data in a
WAL. For heap size - 32G and all other default setting, this gives ~ 26GB of data. Under heavy
write load, the number of WAL files can increase dramatically. RegionServer LogRoller will
be archiving old WALs periodically. User has three options, either override default hbase.regionserver.maxlogs
or override default hbase.regionserver.logroll.period (decrease), or both to control runaway
WALs.

For system with bursty write load,  the hbase.regionserver.logroll.period can be decreased
to lower value. In this case the maximum number of wal files will be defined by the total
size of memstore (unflushed data), not by the hbase.regionserver.maxlogs. But for majority
of applications there will be no issues with defaults. Data will be flushed periodically from
memstore, the LogRoller will archive old wal files and the system will never reach the new
defaults for hbase.regionserver.maxlogs, unless the system is under extreme load for prolonged
period of time, but in this case, decreasing hbase.regionserver.logroll.period allows us to
control runaway wal files.

The following table gives the new default maximum log files values for several different Region
Server heap sizes:

heap	memstore perc	maxLogs
1G	        40%	                        32
2G	        40%	                        32
10G	        40%	                        80
20G	        40%	                        160
32G	        40%	                        256



  

  was:
Rolling WAL events across a cluster can be highly correlated, hence flushing memstores, hence
triggering minor compactions, that can be promoted to major ones. These events are highly
correlated in time if there is a balanced write-load on the regions in a table. Default value
for maximum WAL files (* hbase.regionserver.maxlogs*), which controls WAL rolling events -
32 is too small for many modern deployments. 
Now we calculate this value dynamically (if not defined by user), using the following formula:

maxLogs = Math.max( 32, HBASE_HEAP_SIZE * memstoreRatio * 2/ LogRollSize), where

memstoreRatio is *hbase.regionserver.global.memstore.size*
LogRollSize is maximum WAL file size (default 0.95 * HDFS block size)

We need to make sure that we avoid fully or minimize events when RS has to flush memstores
prematurely only because it reached artificial limit of hbase.regionserver.maxlogs, this is
why we put this 2 x multiplier in equation, this gives us maximum WAL capacity of 2 x RS memstore-size.


Runaway WAL files.

The default log rolling period (1h) allows to accumulate up to 2 X Memstore Size data in a
WAL. For heap size - 32G and all other default setting, this gives ~ 26GB of data. Under heavy
write load, the number of WAL files can increase dramatically. RegionServer LogRoller will
be archiving old WALs periodically. User has three options, either override default hbase.regionserver.maxlogs
or override default hbase.regionserver.logroll.period (decrease), or both to control runaway
WALs.

The following table gives the new default maximum log files values for several different Region
Server heap sizes:

heap	memstore perc	maxLogs
1G	        40%	                        32
2G	        40%	                        32
10G	        40%	                        80
20G	        40%	                        160
32G	        40%	                        256



  


> Make hbase.regionserver.maxlogs obsolete
> ----------------------------------------
>
>                 Key: HBASE-14951
>                 URL: https://issues.apache.org/jira/browse/HBASE-14951
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance, wal
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>            Priority: Minor
>             Fix For: 2.0.0, 1.2.0, 1.3.0
>
>         Attachments: HBASE-14951-v1.patch, HBASE-14951-v2.patch
>
>
> There was a discussion in HBASE-14388 related to maximum number of log files. It was
an agreement that we should calculate this number in a code but still need to honor user's
setting. 
> Maximum number of log files now is calculated as following:
>  maxLogs = HEAP_SIZE * memstoreRatio * 2/ LogRollSize



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message