hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14802) Replaying server crash recovery procedure after a failover causes incorrect handling of deadservers
Date Sun, 15 Nov 2015 23:42:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006097#comment-15006097
] 

Hudson commented on HBASE-14802:
--------------------------------

SUCCESS: Integrated in HBase-1.2-IT #283 (See [https://builds.apache.org/job/HBase-1.2-IT/283/])
HBASE-14802 Replaying server crash recovery procedure after a failover (stack: rev f5cebbbaf14e32d47b7659300f992028fb376225)
* hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDeadServer.java


> Replaying server crash recovery procedure after a failover causes incorrect handling
of deadservers
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-14802
>                 URL: https://issues.apache.org/jira/browse/HBASE-14802
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 2.0.0, 1.2.0, 1.2.1
>            Reporter: Ashu Pachauri
>            Assignee: Ashu Pachauri
>             Fix For: 2.0.0, 1.2.0, 1.3.0
>
>         Attachments: 14802.addendum.branch-1.txt, HBASE-14802-1.patch, HBASE-14802-2.patch,
HBASE-14802-3.patch, HBASE-14802-4.patch, HBASE-14802.patch
>
>
> The way dead servers are processed is that a ServerCrashProcedure is launched for a server
after it is added to the dead servers list. 
> Every time a server is added to the dead list, a counter "numProcessing" is incremented
and it is decremented when a crash recovery procedure finishes. Since, adding a dead server
and recovering it are two separate events, it can cause inconsistencies.
> If a master failover occurs in the middle of the crash recovery, the numProcessing counter
resets but the ServerCrashProcedure is replayed by the new master. This causes the counter
to go negative and makes the master think that dead servers are still in process of recovery.

> This has ramifications on the balancer that the balancer ceases to run after such a failover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message