zookeeper-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [zookeeper] maoling commented on a change in pull request #1063: ZOOKEEPER 3522: Consistency guarantees discussion.
Date Tue, 27 Aug 2019 02:51:59 GMT
maoling commented on a change in pull request #1063: ZOOKEEPER 3522: Consistency guarantees
URL: https://github.com/apache/zookeeper/pull/1063#discussion_r317873957

 File path: zookeeper-docs/src/main/resources/markdown/zookeeperInternals.md
 @@ -264,6 +266,26 @@ all packets, it all falls apart. Also, our leader activation phase is
 both of them. In particular, our use of epochs allows us to skip blocks of uncommitted
 proposals and to not worry about duplicate proposals for a given zxid.
+<a name="sc_consistency"></a>
+## Consistency Guarantees
+ZooKeeper [consistency](https://jepsen.io/consistency) guarantees lie between sequential
consistency and linearizabiliy. Here, we explain the exact consistency guarantees that ZooKepeer
+Write operations in ZooKeeper are linearizabile. In other words, each write appears to take
effect atomically at some point between its invocation and its response. This means that the
writes performed by all the clients in ZooKeeper can be totally ordered in such a way that
respects the real-time ordering of these writes. However, note that just stating that writes
are linearizable is meaningless unless we also talk about read operations.
+Read operations in ZooKeeper are not linearizable since they can return potentially stale
data. This occurs since a read in ZooKeeper is not a quorum operation and a server responds
immediately to a client that is performing a read.
+Nevertheless, ZooKeeper makes this choice because it chooses performance in the trade-off
between performance and consistency. ZooKeeper read operations are sequentially-consistent,
since read operations appear to take effect in some sequential order that furthermore respects
the order of each client's operations. 
+If a client wants to read the freshest data, it is generally assumed that the client should
first perform a sync operation, and then a read.
+However, even with a sync before a read operation, a client might retrieve stale data.
+This can occur because `sync` is [not a quorum operation](https://issues.apache.org/jira/browse/ZOOKEEPER-1675).
Such a scenario might appear if two servers think that they are the leaders at the same time,
which may occur if the time it takes for a TCP connection to drop is smaller than `syncTime
* tickTime`, something that is [unlikely](https://www.amazon.com/ZooKeeper-Distributed-Coordination-Flavio-Junqueira/dp/1449361307)
to occur in practice.
 Review comment:

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

View raw message