zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Lee <apa...@tomlee.co>
Subject QuorumCnxManager "challenge" protocol details?
Date Mon, 09 Apr 2018 20:13:17 GMT

Relatively new to the ZK code base, please be gentle. :) This is bordering
on a question for users@, but I'm asking here because I'm more than happy
to try and dig into the code if it's not too far beyond my reach -- hope
that's okay.

I'm trying to dig into / work around ZOOKEEPER-2938:


Unfortunately, the proposed work-around (simply restarting the leader)
isn't particularly great for us because of some limitations in our
automation -- so I'm trying to see if we can find some alternatives and/or
fix the issue properly.

Looking at
-- afaict what's happening is the "unhappy"/prospective member of the
quorum is attempting to connect to other, established members, sends a
challenge request (which seems to just be a simple payload consisting of
its ID and the local election host + port), then promptly closes the
connection because its own ID is less than that of the recipient(s) --
seemingly without waiting for a response.

The mechanics are all easy enough to understand, but I feel like I'm
lacking some context RE: what's *supposed *to happen here. When this code
is all working as expected, what *should *happen with respect to these
challenges? What is this code trying to achieve by forcefully disconnecting
from peers with an ID greater than the local peer?

I also don't fully understand why restarting the leader would fix things,
but that's probably just something I need to dive into to get to the bottom
of this.

Appreciate any guidance.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message