hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tianying Chang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
Date Thu, 16 Jun 2016 06:49:05 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333224#comment-15333224

Tianying Chang commented on HBASE-16030:

[~enis] thanks for reviewing the patch. Yes, 5 minutes is not enough, we would like to see
the flush uniformly distributed through the one hour range in online facing production cluster.
I am fine if we can make this value configurable, therefore larger than 5 min. Will it have
a problem if flush request is queued and delayed for up to 1 hour? 

BTW, attached a new graph to show the impact of the hourly spike on the network/disk/cpu on
our new 1.2RC test cluster.

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing
flush spike
> --------------------------------------------------------------------------------------------------
>                 Key: HBASE-16030
>                 URL: https://issues.apache.org/jira/browse/HBASE-16030
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 1.2.1
>            Reporter: Tianying Chang
>            Assignee: Tianying Chang
>             Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3
>         Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, hbase-16030.patch
> In our production cluster, we observed that memstore flush spike every hour for all regions/RS.
(we use the default memstore periodic flush time of 1 hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at the same
time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same time at: startTime+1hour-delay
again and again.
> We added a flush jittering time to randomize the flush time of each region, so that they
don't get flushed at around the same time. We had this feature running in our 94.7 and 94.26
cluster. Recently, we upgrade to 1.2, found this issue still there in 1.2. So we are porting
this into 1.2 branch. 

This message was sent by Atlassian JIRA

View raw message