Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4ED2218D23 for ; Tue, 14 Jul 2015 17:59:12 +0000 (UTC) Received: (qmail 70217 invoked by uid 500); 14 Jul 2015 17:59:05 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 70169 invoked by uid 500); 14 Jul 2015 17:59:05 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 70014 invoked by uid 99); 14 Jul 2015 17:59:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jul 2015 17:59:05 +0000 Date: Tue, 14 Jul 2015 17:59:05 +0000 (UTC) From: "Lars Hofhansl (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-14058) Stabilizing default heap memory tuner MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-14058?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D146= 26773#comment-14626773 ]=20 Lars Hofhansl commented on HBASE-14058: --------------------------------------- bq. IMO it wont be a good idea to use it for stat calculation as it will ta= mper the actual trend and stats. Hmm... We want to capture the trend and not the current state. I.e. if we t= rend towards write load for a while then we tune for writeloads, if we tren= d towards readloads we tune for that, but only after a bit. Maybe I'm missing something. Lemme actually look at the patch. > Stabilizing default heap memory tuner > ------------------------------------- > > Key: HBASE-14058 > URL: https://issues.apache.org/jira/browse/HBASE-14058 > Project: HBase > Issue Type: Improvement > Components: regionserver > Affects Versions: 2.0.0, 1.2.0, 1.3.0 > Reporter: Abhilash > Assignee: Abhilash > Attachments: HBASE-14058-v1.patch, HBASE-14058.patch, after_modif= ications.png, before_modifications.png > > > The memory tuner works well in general cases but when we have a work load= that is both read heavy as well as write heavy the tuner does too many tun= ing. We should try to control the number of tuner operation and stabilize i= t. The main problem was that the tuner thinks it is in steady state even if= it sees just one neutral tuner period thus does too many tuning operations= and too many reverts that too with large step sizes(step size was set to m= aximum even after one neutral period). So to stop this I have thought of th= ese steps: > 1) The division created by =CE=BC + =CE=B4/2 and =CE=BC - =CE=B4/2 is too= small. Statistically ~62% periods will lie outside this range, which means= 62% of the data points are considered either high or low which is too much= . Use =CE=BC + =CE=B4*0.8 and =CE=BC - =CE=B4*0.8 instead. On expectations = it will decrease number of tuner operations per 100 periods from 19 to just= 10. If we use =CE=B4/2 then 31% of data values will be considered to be hi= gh and 31% will be considered to be low (2*0.31 * 0.31 =3D 0.19), on the ot= her hand if we use =CE=B4*0.8 then 22% will be low and 22% will be high(2*0= .22*0.22 ~ 0.10). > 2) Defining proper steady state by looking at past few periods(it is equa= l to hbase.regionserver.heapmemory.autotuner.lookup.periods) rather than ju= st last tuner operation. We say tuner is in steady state when last few tune= r periods were NEUTRAL. We keep decreasing step size unless it is extremely= low. Then leave system in that state for some time. > 3) Rather then decreasing step size only while reverting, decrease the ma= gnitude of step size whenever we are trying to revert tuning done in last f= ew periods(sum the changes of last few periods and compare to current step)= rather than just looking at last period. When its magnitude gets too low t= hen make tuner steps NEUTRAL(no operation). This will cause step size to co= ntinuously decrease unless we reach steady state. After that tuning process= will restart (tuner step size rests again when we reach steady state). > 4) The tuning done in last few periods will be decaying sum of past tuner= steps with sign. This parameter will be positive for increase in memstore = and negative for increase in block cache. Rather than using arithmetic mean= we use this to give more priority to recent tuner steps. > Please see the attachments. One represents the size of memstore(green) an= d size of block cache(blue) adjusted by tuner without these modification an= d other with the above modifications. The x-axis is time axis and y-axis is= the fraction of heap memory available to memstore and block cache at that = time(it always sums up to 80%). I configured min/max ranges for both compon= ents to 0.1 and 0.7 respectively(so in the plots the y-axis min and max is = 0.1 and 0.7). In both cases the tuner tries to distribute memory by giving = ~15% to memstore and ~65% to block cache. But the modified one does it much= more smoothly. > I got these results from YCSB test. The test was doing approximately 5000= inserts and 500 reads per second (for one region server). The results can = be further fine tuned and number of tuner operation can be reduced with the= se changes in configuration. > For more fine tuning: > a) lower max step size (suggested =3D 4%) > b) lower min step size ( default if also fine ) > To further decrease frequency of tuning operations: > c) increase the number of lookup periods ( in the tests it was just 10, d= efault is 60 ) > d) increase tuner period ( in the tests it was just 20 secs, default is 6= 0secs) > I used smaller tuner period/ number of look up periods to get more data p= oints. -- This message was sent by Atlassian JIRA (v6.3.4#6332)