Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 99784200B5A for ; Thu, 4 Aug 2016 09:19:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 98261160A7C; Thu, 4 Aug 2016 07:19:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E1183160AAE for ; Thu, 4 Aug 2016 09:19:21 +0200 (CEST) Received: (qmail 39077 invoked by uid 500); 4 Aug 2016 07:19:21 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 39066 invoked by uid 99); 4 Aug 2016 07:19:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Aug 2016 07:19:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id A9D202C0D67 for ; Thu, 4 Aug 2016 07:19:20 +0000 (UTC) Date: Thu, 4 Aug 2016 07:19:20 +0000 (UTC) From: "Anoop Sam John (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-16287) LruBlockCache size should not exceed acceptableSize too many MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 04 Aug 2016 07:19:22 -0000 [ https://issues.apache.org/jira/browse/HBASE-16287?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D154= 07340#comment-15407340 ]=20 Anoop Sam John commented on HBASE-16287: ---------------------------------------- Nice reviews.. {code} if (!evictionInProgress) { 385=09 runEviction(); 386=09 } {code} Not just on this patch. We have boolean 'evictionInProgress' to make sure= no parallel run for eviction. This boolean is marked volatile also. But s= till we wont guarentee that. Unless this is an an AtomicBoolean and we use = a CAS op. It is ok to check and fix that as part of another Jira as it is = not related to the topic of this Jira. > LruBlockCache size should not exceed acceptableSize too many > ------------------------------------------------------------ > > Key: HBASE-16287 > URL: https://issues.apache.org/jira/browse/HBASE-16287 > Project: HBase > Issue Type: Improvement > Components: BlockCache > Reporter: Yu Sun > Assignee: Yu Sun > Attachments: HBASE-16287-v1.patch, HBASE-16287-v2.patch, HBASE-16= 287-v3.patch, HBASE-16287-v4.patch, HBASE-16287-v5.patch, HBASE-16287-v6.pa= tch, HBASE-16287-v7.patch, HBASE-16287-v8.patch > > > Our regionserver has a configuation as bellow=EF=BC=9A > -Xmn4g -Xms32g -Xmx32g -XX:SurvriorRatio=3D2 -XX:+UseConcMarkSweepGC=20 > also we only use blockcache,and set hfile.block.cache.size =3D 0.3 in hba= se_site.xml,so under this configuration, the lru block cache size will be(3= 2g-1g)*0.3=3D9.3g. but in some scenarios=EF=BC=8Csome of the rs will occur = continuous FullGC for hours and most importantly, after FullGC most of the= object in old will not be GCed. so we dump the heap and analyse with MAT a= nd we observed a obvious memory leak in LruBlockCache, which occpy about 16= g memory, then we set set class LruBlockCache log level to TRACE and observ= ed this in log: > {quote} > 2016-07-22 12:17:58,158 INFO [LruBlockCacheStatsExecutor] hfile.LruBlock= Cache: totalSize=3D15.29 GB, freeSize=3D-5.99 GB, max=3D9.30 GB, blockCount= =3D628182, accesses=3D101799469125, hits=3D93517800259, hitRatio=3D91.86%, = , cachingAccesses=3D99462650031, cachingHits=3D93468334621, cachingHitsRati= o=3D93.97%, evictions=3D238199, evicted=3D4776350518, evictedPerRun=3D20051= .93359375{quote} > we can see blockcache size has exceeded acceptableSize too many, which wi= ll cause the FullGC more seriously.=20 > Afterfter some investigations, I found in this function: > {code:borderStyle=3Dsolid} > public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean i= nMemory, > final boolean cacheDataInL1) { > {code} > No matter the blockcache size has been used, just put the block into it. = but if the evict thread is not fast enough, blockcache size will increament= significantly. > So here I think we should have a check, for example, if the blockcache si= ze > 1.2 * acceptableSize(), just return and dont put into it until the blo= ckcache size if under watrmark. if this is reasonable, I can make a small p= atch for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)