hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 伍照坤 <tonywu...@gmail.com>
Subject Balance to dead region server?
Date Wed, 09 Sep 2015 00:16:00 GMT
Hi, Guys

I encountered a serious problem in Production, the HMaster schedule lots of
balance jobs to a dead node.

Environment: hbase-1.0.0-cdh.4.0, hadoop-2.6.0-cdh5.4.0,
zookeeper-3.4.5-cdh5.4.0

the region server e3ecmrhdp24 is dead from 09/03/2015.
I checked the Zookeeper /hbase/rs, and HBase WebUI, this server is dead
node.

But the hmaster still schedule lots of balance jobs to e3ecmrhdp24 after
this region server is dead.

the balance job runs every 5 minutes, which schedules 60000+ region balance
on this dead region server.

#1 the balancer on hmaster will schedule region to balance to e3ecmrhdp24.
#2 after 1 seconds, the hmaster assign this region to another region server

I guess
#1 e3ecmrhdp24 is still a live node in HMaster memory.
#2 the number of regions on e3ecmrhdp24 is less than the balance ratio, so
the balancer always schedule region to this dead server.

After I restarted the HMaster, this problem is gone.

It looks a critical bug in HBase, any hints?



​

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message