Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Tue, 19 Jan 2016 08:02:39 +0000 (UTC)
From: "deepankar (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12929973.1452723158000.145806.1453190559852@Atlassian.JIRA>
In-Reply-To: <JIRA.12929973.1452723158000@Atlassian.JIRA>
References: <JIRA.12929973.1452723158000@Atlassian.JIRA>
 <JIRA.12929973.1452723158572@arcas>
Subject: [jira] [Commented] (HBASE-15101) Leaked References to
 StoreFile.Reader after HBASE-13082
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-15101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106413#comment-15106413 ] 

deepankar commented on HBASE-15101:
-----------------------------------

I thought before HBASE-13082, when a compaction starts and before it completes the files are present in .tmp directory (of the region folder) and finalized once it completes giving a very small window (after moving in the files from .tmp and moving out files from RegionServer) where there could be that all files are present. This is not the case after HBASE-13082 because both the set of files are present in the folder for a longer period of time and if there is any leak in the reference counting then all the files co exist and it can lead to a region size explosion . 

This is what exactly happened with us, without this patch we were running one regionserver with HBASE-13082 and almost all the regions on that server had all the files from the time of begining of that regionserver and movement of region to that server (movement rarely happens). The worst is we force major compact regions daily and that lead to the region data getting repeated over 7 times and In panic when we shutdown (gracefully) this server it lead to other regionservers that hosted these regions keep on compacting the whole next day (as each of them contained 5-7x the data of normal region). 

So then when applied this patch and hosted only two regions on this experimental regionserver for 2 days, and the samething repeated and when again we shutdown (again gracefully) the regionserver all the files did remain in the directory and it did lead to longer compaction next time.

If we can come up with patch after leak may I could take a stab testing again, I will also go through the close() to see if I am missing any thing.

Thanks


> Leaked References to StoreFile.Reader after HBASE-13082
> -------------------------------------------------------
>
>                 Key: HBASE-15101
>                 URL: https://issues.apache.org/jira/browse/HBASE-15101
>             Project: HBase
>          Issue Type: Bug
>          Components: HFile, io
>    Affects Versions: 2.0.0
>            Reporter: deepankar
>            Assignee: deepankar
>         Attachments: HBASE-15101-v1.patch, HBASE-15101-v2.patch, HBASE-15101-v3.patch, HBASE-15101.patch
>
>
> We observed this production that after a region server dies there are huge number of hfiles in that region for the region server running the version with HBASE-13082, In the doc it is given that it is expected to happen, but we found a one place where scanners are not being closed. If the scanners are not closed their references are not decremented and that is leading to the issue of huge number of store files not being finalized
> All I was able to find is in the selectScannersFrom, where we discard some of the scanners and we are not closing them. I am attaching a patch for that.
> Also to avoid these issues should the files that are done be logged and finalized (moved to archive) as a part of region close operation. This will solve any leaks that can happen and does not cause any dire consequences?


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)