hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prakash Khemani (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-3822) region server stuck in waitOnAllRegionsToClose
Date Tue, 26 Apr 2011 18:17:04 GMT
region server stuck in waitOnAllRegionsToClose

                 Key: HBASE-3822
                 URL: https://issues.apache.org/jira/browse/HBASE-3822
             Project: HBase
          Issue Type: Bug
            Reporter: Prakash Khemani

The regionserver is not able to exit because the rs thread is stuck here

"regionserver60020" prio=10 tid=0x00002ab2b039e000 nid=0x760a waiting on condition [0x000000004365e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:126)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.waitOnAllRegionsToClose(HRegionServer.java:736)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:689)
        at java.lang.Thread.run(Thread.java:619)


In CloseRegionHandler.process() we do not call removeFromOnlineRegions() if there is an exception.
(In this case I suspect there was a log-rolling exception because of another issue)

    // Close the region
    try {
      // TODO: If we need to keep updating CLOSING stamp to prevent against
      // a timeout if this is long-running, need to spin up a thread?
      if (region.close(abort) == null) {
        // This region got closed.  Most likely due to a split. So instead
        // of doing the setClosedState() below, let's just ignore and continue.
        // The split message will clean up the master state.
        LOG.warn("Can't close region: was already closed during close(): " +
    } catch (IOException e) {
      LOG.error("Unrecoverable exception while closing region " +
        regionInfo.getRegionNameAsString() + ", still finishing close", e);



I think we set the closing flag on the region, it won't be taking any more requests, it is
as good as offline.

Either we should refine the check in waitOnAllRegionsToClose() or CloseRegionHandler.process()
should remove the region from online-regions set.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message