zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@salesforce.com>
Subject Partial crash bug described in Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions (FAST17)
Date Thu, 02 Mar 2017 05:37:10 GMT
Is there a JIRA open for the partial crash bug described in "Redundancy
Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions
to Single Errors and Corruptions" Aishwarya Ganesan, Ramnatthan Alagappan,
Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau, University of
Wisconsin—Madison. 15th USENIX Conference on File and Storage Technologies
(FAST ’17)?

From
https://www.usenix.org/system/files/conference/fast17/fast17-ganesan.pdf


"Unfortunately, ZooKeeper does not recover from write errors to the
transaction head and log tail. On write errors during log initialization,
the error handling code tries to gracefully shutdown the node but kills
only the transaction processing threads; the quorum thread remains alive
(partial crash). Consequently, other nodes believe that the leader is
healthy and do not elect a new leader. However, since the leader has
partially crashed, it cannot propose any transactions, leading to an
indefinite write unavailability."




-- 
Best regards,
Andrew Purtell
apurtell@salesforce.com
apurtell@apache.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message