zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Flavio Junqueira (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2619) Client library reconnecting breaks FIFO client order
Date Tue, 25 Oct 2016 12:01:04 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605104#comment-15605104

Flavio Junqueira commented on ZOOKEEPER-2619:

bq. In my three-line example, it doesn't matter whether the second create is synchronous (mixing)
or asynchronous (not mixing). Isn't it fine to issue a series of asynchronous calls ending
with a synchronous call?

Maybe, it depends on where you make the synchronous call. If you call it from a callback,
then you may end up re-ordering the results. After checking it again, I don't think it is
super important for this discussion. In your simple example, you're right that it doesn't
matter whether the second call is sync or async.

I like the idea of not making the behavior global and configurable via switch. In particular,
the configuration switch is error prone. I still think that mixing API use and different behavior,
even across modules, with the same ZK handle isn't good practice, but I'm fine with leaving
it up to the application to decide, though.

On {{getConnection}, I have a few of high-level points:

# I like the fact that it is all client side and it makes it clear to the application that
requests are being sent over a given connection. Once the connection drops, a proper suffix
of the operations will receive an error.
# I'd rather not have the sync api and the async api over different objects. The proposal
has the sync api in the zookeeper handle and the async api in the connection object. I'll
discuss this some more below.
# I don't like much the idea of exposing a "connection". The abstraction we have historically
is a session and the fact that a session can have multiple connections over time is mostly
transparent to the application, except for the connection loss event, which makes it clear
that there are multiple connections going on underneath.

On the second point, we don't necessarily have to move the async api to such a connection
object. We could instead have something like a token and the token essentially represents
a connection. If the connection drops, then the token is invalidated. With such a token, we
will be adding a new call for each operations to the handle, something like:

public void create(final String path, 
                            byte data[],
                            List<ACL> acl,
                            CreateMode createMode,
                            StringCallback cb,
                            Object ctx,
                            FIFOToken token);


and also a call to get the current token:

public FIFOToken getFifoToken() {
    return this.fifoToken;

Again, the main reason for suggesting this is to enable such FIFO-enforced calls in the handle
API rather than having a connection object.

Another point I was thinking about with respect to the use of {{getConnection}} is the following.
If I'm a developer, where should I call it and when? In principle, we can do it upon receiving
a {{SyncConnected}} event via the default watcher (transition from {{Disconnected}} to {{SyncConnected}).
It might not be possible to do it at that point in the case I have one or more threads submitting
operations, though. The applications will need to pause before we get the new connection,
otherwise we can have a mix of threads sending to the invalid connection while others send
to the new connection. In the case it is ok that the FIFO order guarantee is per thread, we
can have each thread individually calling {{getConnection}} (or whatever other variant we
end up agreeing upon) upon receiving a connection loss error. 

The last point of the list is important because we have proposed quite some time back to remove
connection loss by resending upon reconnecting in ZOOKEEPER-22. If we resend, then we won't
be erroring out requests with a connection loss event upon dropping a connection. If we do
it, then we avoid the gaps that we are discussing in this issue in the first place. ZOOKEEPER-22
needs work on the server-side, though. On the positive side, the client-side API changes are

> Client library reconnecting breaks FIFO client order
> ----------------------------------------------------
>                 Key: ZOOKEEPER-2619
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2619
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: Diego Ongaro
> According to the USENIX ATC 2010 [paper|https://www.usenix.org/conference/usenix-atc-10/zookeeper-wait-free-coordination-internet-scale-systems],
ZooKeeper provides "FIFO client order: all requests from a given client are executed in the
order that they were sent by the client." I believe applications written using the Java client
library are unable to rely on this guarantee, and any current application that does so is
broken. Other client libraries are also likely to be affected.
> Consider this application, which is simplified from the algorithm described on Page 4
(right column) of the paper:
> {code}
>   zk = new ZooKeeper(...)
>   zk.createAsync("/data-23857", "...", callback)
>   zk.createSync("/pointer", "/data-23857")
> {code}
> Assume an empty ZooKeeper database to begin with and no other writers. Applying the above
definition, if the ZooKeeper database contains /pointer, it must also contain /data-23857.
> Now consider this series of unfortunate events:
> {code}
>   zk = new ZooKeeper(...)
>   // The library establishes a TCP connection.
>   zk.createAsync("/data-23857", "...", callback)
>   // The library/kernel closes the TCP connection because it times out, and
>   // the create of /data-23857 is doomed to fail with ConnectionLoss. Suppose
>   // that it never reaches the server.
>   // The library establishes a new TCP connection.
>   zk.createSync("/pointer", "/data-23857")
>   // The create of /pointer succeeds.
> {code}
> That's the problem: subsequent operations get assigned to the new connection and succeed,
while earlier operations fail.
> In general, I believe it's impossible to have a system with the following three properties:
>  # FIFO client order for asynchronous operations,
>  # Failing operations when connections are lost, AND
>  # Transparently reconnecting when connections are lost.
> To argue this, consider an application that issues a series of pipelined operations,
then upon noticing a connection loss, issues a series of recovery operations, repeating the
recovery procedure as necessary. If a pipelined operation fails, all subsequent operations
in the pipeline must also fail. Yet the client must also carry on eventually: the recovery
operations cannot be trivially failed forever. Unfortunately, the client library does not
know where the pipelined operations end and the recovery operations begin. At the time of
a connection loss, subsequent pipelined operations may or may not be queued in the library;
others might be upcoming in the application thread. If the library re-establishes a connection
too early, it will send pipelined operations out of FIFO client order.
> I considered a possible workaround of having the client diligently check its callbacks
and watchers for connection loss events, and do its best to stop the subsequent pipelined
operations at the first sign of a connection loss. In addition to being a large burden for
the application, this does not solve the problem all the time. In particular, if the callback
thread is delayed significantly (as can happen due to excessive computation or scheduling
hiccups), the application may not learn about the connection loss event until after the connection
has been re-established and after dependent pipelined operations have already been transmitted
over the new connection.
> I suggest the following API changes to fix the problem:
>  - Add a method ZooKeeper.getConnection() returning a ZKConnection object. ZKConnection
would wrap a TCP connection. It would include all synchronous and asynchronous operations
currently defined on the ZooKeeper class. Upon a connection loss on a ZKConnection, all subsequent
operations on the same ZKConnection would return a Connection Loss error. Upon noticing, the
client would need to call ZooKeeper.getConnection() again to get a working ZKConnection object,
and it would execute its recovery procedure on this new connection.
>  - Deprecate all asynchronous methods on the ZooKeeper object. These are unsafe to use
if the caller assumes they're getting FIFO client order.
>  - No changes to the protocols or servers are required.
> I recognize this could cause a lot of code churn for both ZooKeeper and projects that
use it. On the other hand, the existing asynchronous calls in applications should now be audited
> The code affected by this issue may be difficult to contain:
>  - It likely affects all ZooKeeper client libraries that provide both asynchronous operations
and transparent reconnection. That's probably all versions of the official Java client library,
as well as most other client libraries.
>  - It affects all applications using those libraries that depend on the FIFO client order
of asynchronous operations. I don't know how common that is, but the paper implies that FIFO
client order is important.
>  - Fortunately, the issue can only manifest itself when connections are lost and transparently
reestablished. In practice, it may also require a long pipeline or a significant delay in
the application thread while the library establishes a new connection.
>  - In case you're wondering, this issue occurred to me while working on a new client
library for Go. I haven't seen this issue in the wild, but I was able to produce it locally
by placing sleep statements in a Java program and closing its TCP connections.
> I'm new to this community, so I'm looking forward to the discussion. Let me know if I
can clarify any of the above.

This message was sent by Atlassian JIRA

View raw message