Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E249D200C38 for ; Wed, 1 Mar 2017 06:10:49 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id E0DCA160B7E; Wed, 1 Mar 2017 05:10:49 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3694F160B7C for ; Wed, 1 Mar 2017 06:10:49 +0100 (CET) Received: (qmail 94152 invoked by uid 500); 1 Mar 2017 05:10:48 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 94128 invoked by uid 99); 1 Mar 2017 05:10:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Mar 2017 05:10:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 8ADAF1A02EE for ; Wed, 1 Mar 2017 05:10:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.547 X-Spam-Level: X-Spam-Status: No, score=-1.547 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-2.999, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id AmjKrnuAiKJA for ; Wed, 1 Mar 2017 05:10:46 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 3CC3960DC0 for ; Wed, 1 Mar 2017 05:10:46 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 9D13DE00D3 for ; Wed, 1 Mar 2017 05:10:45 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 5470124156 for ; Wed, 1 Mar 2017 05:10:45 +0000 (UTC) Date: Wed, 1 Mar 2017 05:10:45 +0000 (UTC) From: "ramkrishna.s.vasudevan (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 01 Mar 2017 05:10:50 -0000 [ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889508#comment-15889508 ] ramkrishna.s.vasudevan commented on HBASE-16630: ------------------------------------------------ Pushed to branch-1, branch-1.2 and branch-1.3 branches. Thanks for the update [~dvdreddy]. Will mark it as resovled now. > Fragmentation in long running Bucket Cache > ------------------------------------------ > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache > Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 > Reporter: deepankar > Assignee: deepankar > Priority: Critical > Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch > > > As we are running bucket cache for a long time in our system, we are observing cases where some nodes after some time does not fully utilize the bucket cache, in some cases it is even worse in the sense they get stuck at a value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic case of fragmentation, current implementation of BucketCache (mainly BucketAllocator) relies on the logic that fullyFreeBuckets are available for switching/adjusting cache usage between different bucketSizes . But once a compaction / bulkload happens and the blocks are evicted from a bucket size , these are usually evicted from random places of the buckets of a bucketSize and thus locking the number of buckets associated with a bucketSize and in the worst case of the fragmentation we have seen some bucketSizes with occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also done, the eviction (freeSpace function) will not evict anything and the cache utilization will be stuck at that value without any allocations for other required sizes. > The fix for this we came up with is simple that we do deFragmentation ( compaction) of the bucketSize and thus increasing the occupancy ratio and also freeing up the buckets to be fullyFree, this logic itself is not complicated as the bucketAllocator takes care of packing the blocks in the buckets, we need evict and re-allocate the blocks for all the BucketSizes that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.15#6346)