Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 56E5D200D5A for ; Thu, 9 Nov 2017 02:24:08 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 557C7160BDA; Thu, 9 Nov 2017 01:24:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9AD77160C01 for ; Thu, 9 Nov 2017 02:24:07 +0100 (CET) Received: (qmail 72910 invoked by uid 500); 9 Nov 2017 01:24:06 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 72885 invoked by uid 99); 9 Nov 2017 01:24:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Nov 2017 01:24:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id D31D2C63E5 for ; Thu, 9 Nov 2017 01:24:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 2NnPGzj-rwkz for ; Thu, 9 Nov 2017 01:24:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id B4A4560F61 for ; Thu, 9 Nov 2017 01:24:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 0C9ABE0A3A for ; Thu, 9 Nov 2017 01:24:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id BF4DF240E2 for ; Thu, 9 Nov 2017 01:24:00 +0000 (UTC) Date: Thu, 9 Nov 2017 01:24:00 +0000 (UTC) From: "Andrew Purtell (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-16287) LruBlockCache size should not exceed acceptableSize too many MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 09 Nov 2017 01:24:08 -0000 [ https://issues.apache.org/jira/browse/HBASE-16287?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-16287: ----------------------------------- Fix Version/s: (was: 1.4.0) > LruBlockCache size should not exceed acceptableSize too many > ------------------------------------------------------------ > > Key: HBASE-16287 > URL: https://issues.apache.org/jira/browse/HBASE-16287 > Project: HBase > Issue Type: Improvement > Components: BlockCache > Reporter: Yu Sun > Assignee: Yu Sun > Fix For: 2.0.0, 1.3.0, 1.2.3 > > Attachments: HBASE-16287-v1.patch, HBASE-16287-v2.patch, HBASE-16= 287-v3.patch, HBASE-16287-v4.patch, HBASE-16287-v5.patch, HBASE-16287-v6.pa= tch, HBASE-16287-v7.patch, HBASE-16287-v8.patch, HBASE-16287-v9.patch > > > Our regionserver has a configuation as bellow=EF=BC=9A > -Xmn4g -Xms32g -Xmx32g -XX:SurvriorRatio=3D2 -XX:+UseConcMarkSweepGC=20 > also we only use blockcache,and set hfile.block.cache.size =3D 0.3 in hba= se_site.xml,so under this configuration, the lru block cache size will be(3= 2g-1g)*0.3=3D9.3g. but in some scenarios=EF=BC=8Csome of the rs will occur = continuous FullGC for hours and most importantly, after FullGC most of the= object in old will not be GCed. so we dump the heap and analyse with MAT a= nd we observed a obvious memory leak in LruBlockCache, which occpy about 16= g memory, then we set set class LruBlockCache log level to TRACE and observ= ed this in log: > {quote} > 2016-07-22 12:17:58,158 INFO [LruBlockCacheStatsExecutor] hfile.LruBlock= Cache: totalSize=3D15.29 GB, freeSize=3D-5.99 GB, max=3D9.30 GB, blockCount= =3D628182, accesses=3D101799469125, hits=3D93517800259, hitRatio=3D91.86%, = , cachingAccesses=3D99462650031, cachingHits=3D93468334621, cachingHitsRati= o=3D93.97%, evictions=3D238199, evicted=3D4776350518, evictedPerRun=3D20051= .93359375{quote} > we can see blockcache size has exceeded acceptableSize too many, which wi= ll cause the FullGC more seriously.=20 > Afterfter some investigations, I found in this function: > {code:borderStyle=3Dsolid} > public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean i= nMemory, > final boolean cacheDataInL1) { > {code} > No matter the blockcache size has been used, just put the block into it. = but if the evict thread is not fast enough, blockcache size will increament= significantly. > So here I think we should have a check, for example, if the blockcache si= ze > 1.2 * acceptableSize(), just return and dont put into it until the blo= ckcache size if under watrmark. if this is reasonable, I can make a small p= atch for this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)