hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samir Ahmic (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-14664) Master failover issue: Backup master is unable to start if active master is killed and started in short time interval
Date Thu, 12 Nov 2015 19:19:11 GMT

     [ https://issues.apache.org/jira/browse/HBASE-14664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Samir Ahmic updated HBASE-14664:
--------------------------------
    Status: Open  (was: Patch Available)

Removing "/hbase/meta-region-server" to avoid backup master startup failure will cause issues
in recovery process. I will try to find other solution for this issue. 
Simple workaround  is to configure "hbase.balancer.tablesOnMaster" to "none" . 

> Master failover issue: Backup master is unable to start if active master is killed and
started in short time interval
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-14664
>                 URL: https://issues.apache.org/jira/browse/HBASE-14664
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 2.0.0
>            Reporter: Samir Ahmic
>            Assignee: Samir Ahmic
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: HBASE-14664.patch, HBASE-14664.patch
>
>
> I notice this issue while running IntegrationTestDDLMasterFailover, it can be simply
reproduced by executing this on active master (tested on two masters + 3rs cluster setup)
> {code}
> $ kill -9 master_pid; hbase-daemon.sh  start master
> {code} 
> Logs show that new active master is trying to locate hbase:meta table on restarted active
master
> {code}
> 2015-10-21 19:28:20,804 INFO  [hnode2:16000.activeMasterManager] zookeeper.MetaTableLocator:
Failed verification of hbase:meta,,1 at address=hnode1,16000,1445447051681, exception=org.apache.hadoop.hbase.ipc.ServerNotRunningYetException:
Server is not running yet
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1092)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegionInfo(RSRpcServices.java:1330)
>         at org.apache.hadoop.hbase.master.MasterRpcServices.getRegionInfo(MasterRpcServices.java:1525)
>         at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22233)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-10-21 19:28:20,805 INFO  [hnode2:16000.activeMasterManager] master.HMaster: Meta
was in transition on hnode1,16000,1445447051681
> 2015-10-21 19:28:20,805 INFO  [hnode2:16000.activeMasterManager] master.AssignmentManager:
Processing {1588230740 state=OPEN, ts=1445448500598, server=hnode1,16000,1445447051681
> {code}
>  and because of above master is unable to read hbase:meta table:
> {code}
> 2015-10-21 19:28:49,429 INFO  [hconnection-0x6e9cebcc-shared--pool6-t1] client.AsyncProcess:
#2, table=hbase:meta, attempt=10/351 failed=1ops, last exception: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException:
org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1092)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2083)
>         at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32462)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> which cause master is unable to complete start. 
> I have also notices that in this case value of /hbase/meta-region-server znode is always
pointing on restarted active master (hnode1 in my cluster ).
> I was able to workaround this issue by repeating same scenario with following:
> {code}
> $ kill -9 master_pid; hbase zkcli rmr /hbase/meta-region-server; hbase-daemon.sh start
master
> {code}
> So issue is probably caused by staled value in /hbase/meta-region-server znode. I will
try to create patch based on above.   
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message