zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rakesh Radhakrishnan <rake...@apache.org>
Subject Re: Partial crash bug described in Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions (FAST17)
Date Thu, 02 Mar 2017 15:45:22 GMT
Thanks a lot Andrew Purtell for pointing out this.

I could see, https://issues.apache.org/jira/browse/ZOOKEEPER-2247 jira is
talking about similar case. Could you please go through this jira and let
me know your comments.

It seems they have used ZooKeeper (v3.4.8) for preparing the report. This
bug is fixed and available only in the latest stable version 3.4.9.

Thanks,
Rakesh

On Thu, Mar 2, 2017 at 11:07 AM, Andrew Purtell <apurtell@salesforce.com>
wrote:

> Is there a JIRA open for the partial crash bug described in "Redundancy
> Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions
> to Single Errors and Corruptions" Aishwarya Ganesan, Ramnatthan Alagappan,
> Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau, University of
> Wisconsin—Madison. 15th USENIX Conference on File and Storage Technologies
> (FAST ’17)?
>
> From
> https://www.usenix.org/system/files/conference/fast17/fast17-ganesan.pdf
>
>
> "Unfortunately, ZooKeeper does not recover from write errors to the
> transaction head and log tail. On write errors during log initialization,
> the error handling code tries to gracefully shutdown the node but kills
> only the transaction processing threads; the quorum thread remains alive
> (partial crash). Consequently, other nodes believe that the leader is
> healthy and do not elect a new leader. However, since the leader has
> partially crashed, it cannot propose any transactions, leading to an
> indefinite write unavailability."
>
>
>
>
> --
> Best regards,
> Andrew Purtell
> apurtell@salesforce.com
> apurtell@apache.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message