zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abraham Fine (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (ZOOKEEPER-2528) ZooKeeper cluster can become unavailable due to power failures
Date Thu, 09 Mar 2017 20:39:38 GMT

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Abraham Fine reassigned ZOOKEEPER-2528:
---------------------------------------

    Assignee:     (was: Abraham Fine)

> ZooKeeper cluster can become unavailable due to power failures
> --------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2528
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2528
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.8
>         Environment: A normal ZooKeeper cluster of 3 nodes running on 3 Linux machines.

>            Reporter: Ramnatthan Alagappan
>            Priority: Critical
>
> ZooKeeper cluster can become unavailable if power failures happen at certain specific
points in time. 
> Details:
> I am running a three-node ZooKeeper cluster. I perform a simple update from a client
machine. 
> When I try to update a value, ZooKeeper creates a new log file (for example, when the
current log is fully utilized). First, it creates the file and appends some header information
to the newly created log. The system call sequence looks like below:
> creat(log.200000001)
> append(log.200000001, offset=0,  count=16)
> Now, if a power failure happens just after the creat of the log file but before the append
of the header information, the node simply crashes with an EOF exception. If the same problem
occurs at two or more nodes in my three-node cluster, the entire cluster becomes unavailable
as the majority of servers have crashed because of the above problem.  
> A power failure at the same time across multiple nodes may be possible in single data
center or single rack deployment scenarios. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message