hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "deepankar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15101) Leaked References to StoreFile.Reader after HBASE-13082
Date Tue, 19 Jan 2016 08:02:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106413#comment-15106413

deepankar commented on HBASE-15101:

I thought before HBASE-13082, when a compaction starts and before it completes the files are
present in .tmp directory (of the region folder) and finalized once it completes giving a
very small window (after moving in the files from .tmp and moving out files from RegionServer)
where there could be that all files are present. This is not the case after HBASE-13082 because
both the set of files are present in the folder for a longer period of time and if there is
any leak in the reference counting then all the files co exist and it can lead to a region
size explosion . 

This is what exactly happened with us, without this patch we were running one regionserver
with HBASE-13082 and almost all the regions on that server had all the files from the time
of begining of that regionserver and movement of region to that server (movement rarely happens).
The worst is we force major compact regions daily and that lead to the region data getting
repeated over 7 times and In panic when we shutdown (gracefully) this server it lead to other
regionservers that hosted these regions keep on compacting the whole next day (as each of
them contained 5-7x the data of normal region). 

So then when applied this patch and hosted only two regions on this experimental regionserver
for 2 days, and the samething repeated and when again we shutdown (again gracefully) the regionserver
all the files did remain in the directory and it did lead to longer compaction next time.

If we can come up with patch after leak may I could take a stab testing again, I will also
go through the close() to see if I am missing any thing.


> Leaked References to StoreFile.Reader after HBASE-13082
> -------------------------------------------------------
>                 Key: HBASE-15101
>                 URL: https://issues.apache.org/jira/browse/HBASE-15101
>             Project: HBase
>          Issue Type: Bug
>          Components: HFile, io
>    Affects Versions: 2.0.0
>            Reporter: deepankar
>            Assignee: deepankar
>         Attachments: HBASE-15101-v1.patch, HBASE-15101-v2.patch, HBASE-15101-v3.patch,
> We observed this production that after a region server dies there are huge number of
hfiles in that region for the region server running the version with HBASE-13082, In the doc
it is given that it is expected to happen, but we found a one place where scanners are not
being closed. If the scanners are not closed their references are not decremented and that
is leading to the issue of huge number of store files not being finalized
> All I was able to find is in the selectScannersFrom, where we discard some of the scanners
and we are not closing them. I am attaching a patch for that.
> Also to avoid these issues should the files that are done be logged and finalized (moved
to archive) as a part of region close operation. This will solve any leaks that can happen
and does not cause any dire consequences?

This message was sent by Atlassian JIRA

View raw message