Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3860A18409 for ; Tue, 19 Jan 2016 08:02:40 +0000 (UTC) Received: (qmail 6606 invoked by uid 500); 19 Jan 2016 08:02:40 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 6553 invoked by uid 500); 19 Jan 2016 08:02:40 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 6540 invoked by uid 99); 19 Jan 2016 08:02:39 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jan 2016 08:02:39 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id D0FF72C1F5B for ; Tue, 19 Jan 2016 08:02:39 +0000 (UTC) Date: Tue, 19 Jan 2016 08:02:39 +0000 (UTC) From: "deepankar (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-15101) Leaked References to StoreFile.Reader after HBASE-13082 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-15101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106413#comment-15106413 ] deepankar commented on HBASE-15101: ----------------------------------- I thought before HBASE-13082, when a compaction starts and before it completes the files are present in .tmp directory (of the region folder) and finalized once it completes giving a very small window (after moving in the files from .tmp and moving out files from RegionServer) where there could be that all files are present. This is not the case after HBASE-13082 because both the set of files are present in the folder for a longer period of time and if there is any leak in the reference counting then all the files co exist and it can lead to a region size explosion . This is what exactly happened with us, without this patch we were running one regionserver with HBASE-13082 and almost all the regions on that server had all the files from the time of begining of that regionserver and movement of region to that server (movement rarely happens). The worst is we force major compact regions daily and that lead to the region data getting repeated over 7 times and In panic when we shutdown (gracefully) this server it lead to other regionservers that hosted these regions keep on compacting the whole next day (as each of them contained 5-7x the data of normal region). So then when applied this patch and hosted only two regions on this experimental regionserver for 2 days, and the samething repeated and when again we shutdown (again gracefully) the regionserver all the files did remain in the directory and it did lead to longer compaction next time. If we can come up with patch after leak may I could take a stab testing again, I will also go through the close() to see if I am missing any thing. Thanks > Leaked References to StoreFile.Reader after HBASE-13082 > ------------------------------------------------------- > > Key: HBASE-15101 > URL: https://issues.apache.org/jira/browse/HBASE-15101 > Project: HBase > Issue Type: Bug > Components: HFile, io > Affects Versions: 2.0.0 > Reporter: deepankar > Assignee: deepankar > Attachments: HBASE-15101-v1.patch, HBASE-15101-v2.patch, HBASE-15101-v3.patch, HBASE-15101.patch > > > We observed this production that after a region server dies there are huge number of hfiles in that region for the region server running the version with HBASE-13082, In the doc it is given that it is expected to happen, but we found a one place where scanners are not being closed. If the scanners are not closed their references are not decremented and that is leading to the issue of huge number of store files not being finalized > All I was able to find is in the selectScannersFrom, where we discard some of the scanners and we are not closing them. I am attaching a patch for that. > Also to avoid these issues should the files that are done be logged and finalized (moved to archive) as a part of region close operation. This will solve any leaks that can happen and does not cause any dire consequences? -- This message was sent by Atlassian JIRA (v6.3.4#6332)