hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maryann Xue (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-5816) Two concurrent assign would cause master to abort with msg "Unexpected state trying to OFFLINE; "
Date Wed, 18 Apr 2012 03:33:14 GMT

     [ https://issues.apache.org/jira/browse/HBASE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Maryann Xue updated HBASE-5816:
-------------------------------

    Status: Open  (was: Patch Available)
    
> Two concurrent assign would cause master to abort with msg "Unexpected state trying to
OFFLINE; "
> -------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5816
>                 URL: https://issues.apache.org/jira/browse/HBASE-5816
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.6
>            Reporter: Maryann Xue
>
> The first assign thread exits with success after updating the RegionState to PENDING_OPEN,
while the second assign follows immediately into "assign" and fails the RegionState check
in setOfflineInZooKeeper(). This causes the master to abort.
> In the below case, the two concurrent assigns occurred when AM tried to assign a region
to a dying/dead RS, and meanwhile the ShutdownServerHandler tried to assign this region (from
the region plan) spontaneously.
> 2012-04-17 05:44:57,648 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.,
src=hadoop05.sh.intel.com,60020,1334544902186, dest=xmlqa-clv16.sh.intel.com,60020,1334612497253
> 2012-04-17 05:44:57,648 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.
(offlining)
> 2012-04-17 05:44:57,648 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent
CLOSE to serverName=hadoop05.sh.intel.com,60020,1334544902186, load=(requests=0, regions=0,
usedHeap=0, maxHeap=0) for region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.
> 2012-04-17 05:44:57,666 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling
new unassigned node: /hbase/unassigned/fe38fe31caf40b6e607a3e6bbed6404b (region=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.,
server=hadoop05.sh.intel.com,60020,1334544902186, state=RS_ZK_REGION_CLOSING)
> 2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing
OFFLINE; was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. state=CLOSED,
ts=1334612697672, server=hadoop05.sh.intel.com,60020,1334544902186
> 2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x236b912e9b3000e
Creating (or updating) unassigned node for fe38fe31caf40b6e607a3e6bbed6404b with OFFLINE state
> 2012-04-17 05:52:59,096 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using
pre-existing plan for region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.;
plan=hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., src=hadoop05.sh.intel.com,60020,1334544902186,
dest=xmlqa-clv16.sh.intel.com,60020,1334612497253
> 2012-04-17 05:52:59,096 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning
region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to xmlqa-clv16.sh.intel.com,60020,1334612497253
> 2012-04-17 05:54:19,159 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing
OFFLINE; was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. state=PENDING_OPEN,
ts=1334613179096, server=xmlqa-clv16.sh.intel.com,60020,1334612497253
> 2012-04-17 05:54:59,033 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed
assignment of TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to serverName=xmlqa-clv16.sh.intel.com,60020,1334612497253,
load=(requests=0, regions=0, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0
> java.net.SocketTimeoutException: Call to /10.239.47.87:60020 failed on socket timeout
exception: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.239.47.89:41302
remote=/10.239.47.87:60020]
>         at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:805)
>         at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:778)
>         at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:283)
>         at $Proxy7.openRegion(Unknown Source)
>         at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:573)
>         at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1127)
>         at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:912)
>         at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:892)
>         at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:92)
>         at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:162)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.239.47.89:41302
remote=/10.239.47.87:60020]
>         at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>         at java.io.FilterInputStream.read(FilterInputStream.java:116)
>         at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:301)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>         at java.io.DataInputStream.readInt(DataInputStream.java:370)
>         at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:541)
>         at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:479)
> 2012-04-17 05:54:59,035 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous
transition plan was found (or we are ignoring an existing plan) for TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.
so generated a random one; hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.,
src=, dest=hadoop06.sh.intel.com,60020,1334544901894; 7 (online=7, exclude=serverName=xmlqa-clv16.sh.intel.com,60020,1334612497253,
load=(requests=0, regions=0, usedHeap=0, maxHeap=0)) available servers
> 2012-04-17 05:54:59,035 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x236b912e9b3000e
Creating (or updating) unassigned node for fe38fe31caf40b6e607a3e6bbed6404b with OFFLINE state
> 2012-04-17 05:54:59,045 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using
pre-existing plan for region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.;
plan=hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., src=, dest=hadoop06.sh.intel.com,60020,1334544901894
> 2012-04-17 05:54:59,045 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning
region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to hadoop06.sh.intel.com,60020,1334544901894
> 2012-04-17 05:54:59,046 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state
trying to OFFLINE; TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. state=PENDING_OPEN,
ts=1334613299045, server=hadoop06.sh.intel.com,60020,1334544901894
> java.lang.IllegalStateException
>         at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1167)
>         at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1107)
>         at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:912)
>         at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:892)
>         at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:259)
>         at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:162)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-04-17 05:54:59,047 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message