hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Dutikov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12426) Dead Region Servers after Decommissioning
Date Thu, 09 Jun 2016 11:12:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15322363#comment-15322363

Mikhail Dutikov commented on HBASE-12426:

Just had the same issue (HBase 1.0.0-cdh5.4.8). Restarting HMasters didn't help.

Removing the following from HDFS (and then restarting the masters) solved the issue for me:

drwxr-xr-x   - 0 2015-08-16 09:15 /hbase/WALs/myserver1-splitting
drwxr-xr-x   - 0 2015-08-19 21:04 /hbase/WALs/myserver2-splitting
drwxr-xr-x   - 0 2015-11-02 12:20 /hbase/WALs/myserver3-splitting

> Dead Region Servers after Decommissioning 
> ------------------------------------------
>                 Key: HBASE-12426
>                 URL: https://issues.apache.org/jira/browse/HBASE-12426
>             Project: HBase
>          Issue Type: Bug
>          Components: Balancer
>    Affects Versions:
>         Environment: RHEL 6.5
>            Reporter: Nishanth Shajahan
>            Priority: Minor
> I initially had a set of  5 region servers  which had a single table  which was pre split
into 30 regions and was evenly distributed to all the regions with data.I then went ahead
and removed/decommissioned a coupe of region servers,so in the end I have 3 region servers.Ran
 hbase hbck and verified there were 0 inconsistencies.However when  'status' command is issued
is from  hbase shell it shows a dead region server and the same is displayed in master UI
as well.Fail over of hbase master did not fix the issue.On investigation we could see some
WAL entries which was still pointing to the old region server.
> /hbase/WALs/myserver,60020,1406745344969-splitting
> After removing these orphan entries from  hdfs and  master failover the dead region servers
went away.I wonder if this  could have  caused any replication issues in the cluster.

This message was sent by Atlassian JIRA

View raw message