hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-4397) "-ROOT-", ".META." table stay offline for too long in the case of all RSs are shutdown at the same time
Date Fri, 30 Dec 2011 07:31:30 GMT

     [ https://issues.apache.org/jira/browse/HBASE-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ming Ma updated HBASE-4397:

    Attachment: HBASE-4397-0.92.patch

There are two ways to address the issue.

1. One way is to have special handling for "-ROOT-" and ".META." tables.
2. Another way is to handle "all RSs just come back online while master is up all the time"
scenario for all the regions.

The patch uses the second approach.
> "-ROOT-", ".META." table stay offline for too long in the case of all RSs are shutdown
at the same time
> -------------------------------------------------------------------------------------------------------
>                 Key: HBASE-4397
>                 URL: https://issues.apache.org/jira/browse/HBASE-4397
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HBASE-4397-0.92.patch
> 1. Shutdown all RSs.
> 2. Bring all RS back online.
> The "-ROOT-", ".META." stay in offline state until timeout monitor force assignment 30
minutes later. That is because HMaster can't find a RS to assign the tables to in assign operation.
> 011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed
assignment of -ROOT-,,0.70236052 to sea-lab-4,60020,1315870341387, trying to assign elsewhere
instead; retry=0
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
>         at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:345)
>         at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1002)
>         at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:854)
>         at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:148)
>         at $Proxy9.openRegion(Unknown Source)
>         at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:407)
>         at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1408)
>         at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1153)
>         at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1128)
>         at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1123)
>         at org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1788)
>         at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:100)
>         at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:118)
>         at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:181)
>         at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:167)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: Unable
to find a viable location to assign region -ROOT-,,0.70236052
> Possible fixes:
> 1. Have serverManager handle "server online" event similar to how RegionServerTracker.java
calls servermanager.expireServer in the case server goes down.
> 2. Make timeoutMonitor handle the situation better. This is a special situation in the
cluster. 30 minutes timeout can be skipped.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message