hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
Date Tue, 11 Nov 2014 19:53:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206941#comment-14206941

Lars Hofhansl commented on HBASE-12457:

Right. The timing is hard though. It seems the master considers the region closed once it
sent the CLOSE.

One option I though about is for the HRegion.doClose() to interrupt any compactions running
(i.e. interrupt the CompactSplitThread). Then upon receiving an interrupted exception the
compactor would recheck writestate.writesEnabled rather than waiting for the next 10mb chunk
to finish writing.
The symptom here looks like the compactor just hanging in some IO (either scanner.next or
writer.append - my bet is on the latter). An interrupt can break out of that and allow the
compactor to recheck the condition.
Might be easiest to explain with a patch. :)

> Regions in transition for a long time when CLOSE interleaves with a slow compaction
> -----------------------------------------------------------------------------------
>                 Key: HBASE-12457
>                 URL: https://issues.apache.org/jira/browse/HBASE-12457
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.7
>            Reporter: Lars Hofhansl
>         Attachments: 12457-minifix.txt
> Under heave load we have observed regions remaining in transition for 20 minutes when
the master requests a close while a slow compaction is running.
> The pattern is always something like this:
> # RS starts a compaction
> # HM request the region to be closed on this RS
> # Compaction is not aborted for another 20 minutes
> # The region is in transition and not usable.
> In every case I tracked down so far the time between the requested CLOSE and abort of
the compaction is almost exactly 20 minutes, which is suspicious.
> Of course part of the issue is having compactions that take over 20 minutes, but maybe
we can do better here.

This message was sent by Atlassian JIRA

View raw message