hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramkrishna.S.Vasudevan" <ramkrishna.vasude...@huawei.com>
Subject RE: Master stuck trying to rebalance regions to a zombie region servers.
Date Tue, 12 Jun 2012 11:15:19 GMT
Hi Pradeep,

Many changes have happened from the version that you are specifying upto the
recent version.  So may be you can try out latest versions.

Regards
Ram

> -----Original Message-----
> From: Pradeep Gopaluni [mailto:pradeep.gopaluni@gmail.com]
> Sent: Tuesday, June 12, 2012 4:35 PM
> To: user@hbase.apache.org
> Subject: Master stuck trying to rebalance regions to a zombie region
> servers.
> 
> Hi,
> 
> We are using HBASE 0.90.3 and Hadoop 0.20.205 for our cluster. Today,
> we
> have observed that the HBASE master was stuck because of  zombie
> regionservers that was responding to ping but was stuck.
> 
> The logs show that the master tried to re-assign its regions from an
> overloaded (ServerA) and one of the regions to be rebalanced is
> designated
> for the server in zombie state (SeverB). This re-assignment started
> failing
> with SocketTimeoutException, and the master was stuck waiting. This got
> resolved only when ServerB was taken out of network. There were no
> master
> logs printed for close to 30 mins, between the attempted reassignment,
> and
> the time ServerB removed from network.
> 
> The issue of infinite wait seems to be very similar to the hadoop RPC
> issue
> - https://issues.apache.org/jira/browse/HADOOP-6889 reported and fixed
> in
> 0.20.205.
> Did anyone face the same problem, was this fixed in later versions of
> HBASE?
> 
> --
> Regards
> pradeep
> 
> *2012-06-07 06:59:32,693 HbaseMaster.domain.com:60000-BalancerChore
> INFO
> org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance
> in
> 1ms. Moving 26 regions off of 24 overloaded servers onto 0 less loaded
> servers
> .
> 2012-06-07 06:59:32,746 HbaseMaster.domain.com:60000-BalancerChore INFO
> org.apache.hadoop.hbase.master.HMaster: balance
> hri=users,XePh1ThgkEgb1LJG8rWWnQ==,1327804993156.b935614a50366924dfac33
> 8bf6d504ef.,
> src=ServerA.domain.com
> <http://servera.domain.com/>,60020,1338788540876,
> dest=ServerB.domain.com
> <http://serverb.domain.com/>,60020,1339042481685
> .
> .
> 2012-06-07 06:59:53,854 MASTER_CLOSE_REGION-
> HbaseMaster.domain.com:60000-3
> WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed
> assignment of
> users,XePh1ThgkEgb1LJG8rWWnQ==,1327804993156.b935614a50366924dfac338bf6
> d504ef.
> to serverName=ServerB.domain.com
> <http://serverb.domain.com/>,60020,1339042481685,
> load=(requests=0, regions=57, usedHeap=30, maxHeap=3998), trying to
> assign
> elsewhere instead; retry=0
> java.net.SocketTimeoutException: 20000 millis timeout while waiting for
> channel to be ready for connect. ch :
> java.nio.channels.SocketChannel[connection-pending remote=/
> 10.193.190.107:60020]
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.j
> ava:213)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:604)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBase
> Client.java:328)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:
> 883)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
>         at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>         at $Proxy7.openRegion(Unknown Source)
>         at
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManag
> er.java:559)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManag
> er.java:934)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManag
> er.java:749)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManag
> er.java:729)
>         at
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(Clos
> edRegionHandler.java:92)
>         at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:156
> )
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecut
> or.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j
> ava:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-06-07 07:01:31,884 908082002@qtp-1543904466-10 INFO
> org.apache.zookeeper.ZooKeeper: Initiating client connection,
> connectString=
> zaas3.domain.com:2181,zaas2.domain.com:2181,zaas1.domain.com:2181,
> zaas5.domain.com:2181,zaas4.domain.com:2181 sessionTimeout=300000
> watcher=hconnection
> 2012-06-07 07:01:31,884 908082002@qtp-1543904466-10-SendThread() INFO
> org.apache.zookeeper.ClientCnxn: Opening socket connection to server
> zaas1.domain.com/10.193.190.117:2181
> 2012-06-07 07:01:31,885 908082002@qtp-1543904466-10-SendThread(
> zaas1.domain.com:2181) INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to zaas1.domain.com/10.193.190.117:2181,
> initiating
> session
> 2012-06-07 07:01:31,887 908082002@qtp-1543904466-10-SendThread(
> zaas1.domain.com:2181) INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server zaas1.domain.com/10.193.190.117:2181,
> sessionid = 0x133cce032e46274, negotiated timeout = 300000
> 2012-06-07 07:01:33,695 908082002@qtp-1543904466-10 INFO
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementa
> tion:
> Closed zookeeper sessionid=0x133cce032e46274
> 2012-06-07 07:01:33,697 908082002@qtp-1543904466-10 INFO
> org.apache.zookeeper.ZooKeeper: Session: 0x133cce032e46274 closed
> 2012-06-07 07:01:33,697 908082002@qtp-1543904466-10-EventThread INFO
> org.apache.zookeeper.ClientCnxn: EventThread shut down
> 2012-06-07 07:04:32,738 HbaseMaster.domain.com:60000-BalancerChore INFO
> org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance
> in
> 1ms. Moving 1 regions off of 1 overloaded servers onto 0 less loaded
> servers
> 2012-06-07 07:04:32,738 HbaseMaster.domain.com:60000-BalancerChore INFO
> org.apache.hadoop.hbase.master.HMaster: balance
> hri=users,iLy9s53ke7L+6gQ9ASdzbA==,1327771137166.d75ed55fbfd7c63ab3fa3a
> 51fb3006a5.,
> src=mdg357.domain.com,60020,1338788541085,
> dest=ServerB.domain.com<http://serverb.domain.com/>
> ,60020,1339042481685
> 2012-06-07 07:04:53,290 MASTER_CLOSE_REGION-
> HbaseMaster.domain.com:60000-0
> WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed
> assignment of
> users,iLy9s53ke7L+6gQ9ASdzbA==,1327771137166.d75ed55fbfd7c63ab3fa3a51fb
> 3006a5.
> to serverName=ServerB.domain.com
> <http://serverb.domain.com/>,60020,1339042481685,
> load=(requests=0, regions=57, usedHeap=30, maxHeap=3998), trying to
> assign
> elsewhere instead; retry=0
> java.net.SocketTimeoutException: 20000 millis timeout while waiting for
> channel to be ready for connect. ch :
> java.nio.channels.SocketChannel[connection-pending remote=/
> 10.193.190.107:60020]
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.j
> ava:213)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:604)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBase
> Client.java:328)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:
> 883)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
>         at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>         at $Proxy7.openRegion(Unknown Source)
>         at
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManag
> er.java:559)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManag
> er.java:934)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManag
> er.java:749)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManag
> er.java:729)
>         at
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(Clos
> edRegionHandler.java:92)
>         at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:156
> )
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecut
> or.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j
> ava:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-06-07 07:08:04,680 908082002@qtp-1543904466-10 INFO
> org.apache.zookeeper.ZooKeeper: Initiating client connection,
> connectString=
> zaas3.domain.com:2181,zaas2.domain.com:2181,zaas1.domain.com:2181,
> zaas5.domain.com:2181,zaas4.domain.com:2181 sessionTimeout=300000
> watcher=hconnection
> 2012-06-07 07:08:04,681 908082002@qtp-1543904466-10-SendThread() INFO
> org.apache.zookeeper.ClientCnxn: Opening socket connection to server
> zaas5.domain.com/10.193.190.68:2181
> 2012-06-07 07:08:04,681 908082002@qtp-1543904466-10-SendThread(
> zaas5.domain.com:2181) INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to zaas5.domain.com/10.193.190.68:2181,
> initiating
> session
> 2012-06-07 07:08:04,683 908082002@qtp-1543904466-10-SendThread(
> zaas5.domain.com:2181) INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server zaas5.domain.com/10.193.190.68:2181,
> sessionid = 0x533cce12e1a4a58, negotiated timeout = 300000
> 2012-06-07 07:08:06,147 908082002@qtp-1543904466-10 INFO
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementa
> tion:
> Closed zookeeper sessionid=0x533cce12e1a4a58
> 2012-06-07 07:08:06,149 908082002@qtp-1543904466-10 INFO
> org.apache.zookeeper.ZooKeeper: Session: 0x533cce12e1a4a58 closed
> 2012-06-07 07:08:06,149 908082002@qtp-1543904466-10-EventThread INFO
> org.apache.zookeeper.ClientCnxn: EventThread shut down
> 2012-06-07 07:09:32,781 HbaseMaster.domain.com:60000-BalancerChore INFO
> org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance
> in
> 1ms. Moving 1 regions off of 1 overloaded servers onto 0 less loaded
> servers
> 2012-06-07 07:09:32,781 HbaseMaster.domain.com:60000-BalancerChore INFO
> org.apache.hadoop.hbase.master.HMaster: balance
> hri=users,VWsWSM9RoX+wNoekA5yXrw==,1327686874360.7e07caa524f0b5fc186919
> d402ab3d11.,
> src=mdg177.domain.com,60020,1338788540944,
> dest=ServerB.domain.com<http://serverb.domain.com/>
> ,60020,1339042481685
> 2012-06-07 07:09:54,410 MASTER_CLOSE_REGION-
> HbaseMaster.domain.com:60000-2
> WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed
> assignment of
> users,VWsWSM9RoX+wNoekA5yXrw==,1327686874360.7e07caa524f0b5fc186919d402
> ab3d11.
> to serverName=ServerB.domain.com
> <http://serverb.domain.com/>,60020,1339042481685,
> load=(requests=0, regions=57, usedHeap=30, maxHeap=3998), trying to
> assign
> elsewhere instead; retry=0
> 0.107:60020]
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.j
> ava:213)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:604)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBase
> Client.java:328)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:
> 883)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
>         at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>         at $Proxy7.openRegion(Unknown Source)
>         at
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManag
> er.java:559)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManag
> er.java:934)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManag
> er.java:749)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManag
> er.java:729)
>         at
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(Clos
> edRegionHandler.java:92)
>         at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:156
> )
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecut
> or.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j
> ava:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-06-07 07:12:43,271 IPC Server handler 22 on 60000 INFO
> org.apache.hadoop.hbase.master.HMaster: Balance=false
> 2012-06-07 07:40:00,016 main-EventThread INFO
> org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer
> ephemeral node deleted, processing expiration
> [ServerB.domain.com<http://serverb.domain.com/>
> ,60020,1339042481685]*
> --
> Regards
> pradeep


Mime
View raw message