hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Balance to dead region server?
Date Wed, 09 Sep 2015 00:25:14 GMT
Can you pastebin master log snippet with regard to the dead server ?



> On Sep 8, 2015, at 5:16 PM, 伍照坤 <tonywutao@gmail.com> wrote:
> 
> Hi, Guys
> 
> I encountered a serious problem in Production, the HMaster schedule lots of balance jobs
to a dead node.
> 
> Environment: hbase-1.0.0-cdh.4.0, hadoop-2.6.0-cdh5.4.0, zookeeper-3.4.5-cdh5.4.0
> 
> the region server e3ecmrhdp24 is dead from 09/03/2015.
> I checked the Zookeeper /hbase/rs, and HBase WebUI, this server is dead node.
> 
> But the hmaster still schedule lots of balance jobs to e3ecmrhdp24 after this region
server is dead.
> 
> the balance job runs every 5 minutes, which schedules 60000+ region balance on this dead
region server.
> 
> #1 the balancer on hmaster will schedule region to balance to e3ecmrhdp24.
> #2 after 1 seconds, the hmaster assign this region to another region server
> 
> I guess
> #1 e3ecmrhdp24 is still a live node in HMaster memory.
> #2 the number of regions on e3ecmrhdp24 is less than the balance ratio, so the balancer
always schedule region to this dead server.
> 
> After I restarted the HMaster, this problem is gone. 
> 
> It looks a critical bug in HBase, any hints? 
> 
> 
> 
> ​ 
> 
> 

Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message