zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shraer <shra...@gmail.com>
Subject Re: Best-practice guides on coordination of operations in distributed systems (and some C client specific questions)
Date Tue, 12 Jan 2016 19:51:07 GMT
Hi,

With your suggestion, the following scenario seems possible: master A is
about to write value X to an external system so it logs it to ZK, then
freezes for some time, ZK suspects it as failed, another master B is
elected, writes X (completing what A wanted to do)
then starts doing something else and writes Y. Then leader A "wakes up" and
re-executes the old operation writing X which is now stale.

perhaps if your external system supports conditional updates this can be
avoided - a write of X only works if the current state is as expected.

Alex

On Tue, Jan 5, 2016 at 1:00 AM, singh.janmejay <singh.janmejay@gmail.com>
wrote:

> Thanks for the replies everyone, most of it was very useful.
>
> @Alexander: The section of chubby paper you pointed me to tries to
> address exactly what I was looking for. That clearly is one good way
> of doing it. Im also thinking of an alternative way and can use a
> review or some feedback on that.
>
> @Powel: About x509 auth on intra-cluster communication, I don't have a
> blocking need for it, as it can be achieved by setting up firewall
> rules to accept only from desired hosts. It may be a good-to-have
> thing though, because in cloud-based scenarios where IP addresses are
> re-used, a recycled IP can still talk to a secure zk-cluster unless
> config is changed to remove the old peer IP and replace it with the
> new one. Its clearly a corner-case though.
>
> Here is the approach Im thinking of:
> - Implement all operations(atleast master-triggered operations) on
> operand machines idempotently
> - Have master journal these operations to ZK before issuing RPC
> - In case master fails with some of these operations in flight, the
> newly elected master will need to read all issued (but not retired
> yet) operations and issue them again.
> - Existing master(before failure or after failure) can retry and
> retire operations according to whatever the retry policy and
> success-criterion is.
>
> Why am I thinking of this as opposed to going with chubby sequencer
> passing:
> - I need to implement idempotency regardless, because recovery-path
> involving master-death after successful execution of operation but
> before writing ack to coordination service requires it. So idempotent
> implementation complexity is here to stay.
> - I need to increase surface-area of the architecture which is exposed
> to coordination-service for sequencer validation. Which may bring
> verification RPC in data-plane in some cases.
> - The sequencer may expire after verification but before ack, in which
> case new master may not recognize the operation as consistent with its
> decisions (or previous decision path).
>
> Thoughts? Suggestions?
>
>
>
> On Sun, Jan 3, 2016 at 2:18 PM, Alexander Shraer <shralex@gmail.com>
> wrote:
> > regarding atomic multi-znode updates -- check out "multi" updates
> > <
> http://tdunning.blogspot.com/2011/06/tour-of-multi-update-for-zookeeper.html
> >
> > .
> >
> > On Sat, Jan 2, 2016 at 10:45 PM, Alexander Shraer <shralex@gmail.com>
> wrote:
> >
> >> for 1, see the chubby paper
> >> <
> http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf
> >,
> >> section 2.4.
> >> for 2, I'm not sure I fully understand the question, but essentially, ZK
> >> guarantees that even during failures
> >> consistency of updates is preserved. The user doesn't need to do
> anything
> >> in particular to guarantee this, even
> >> during leader failures. In such case, some suffix of operations executed
> >> by the leader may be lost if they weren't
> >> previously acked by a majority.However, none of these operations could
> >> have been visible
> >> to reads.
> >>
> >> On Fri, Jan 1, 2016 at 12:29 AM, powell molleti <
> >> powellm79@yahoo.com.invalid> wrote:
> >>
> >>> Hi Janmejay,
> >>> Regarding question 1, if a node takes a lock and the lock has timed-out
> >>> from system perspective then it can mean that other nodes are free to
> take
> >>> the lock and work on the resource. Hence the history could be well
> into the
> >>> future when the previous node discovers the time-out. The question of
> >>> rollback in the specific context depends on the implementation
> details, is
> >>> the lock holder updating some common area?, then there could be
> corruption
> >>> since other nodes are free to write in parallel to the first one?. In
> the
> >>> usual sense a time-out of lock held means the node which held the lock
> is
> >>> dead. It is upto the implementation to ensure this case and, using this
> >>> primitive, if there is a timeout which means other nodes are sure that
> no
> >>> one else is working on the resource and hence can move forward.
> >>> Question 2 seems to imply the assumption that leader has significant
> work
> >>> todo and leader change is quite common, which seems contrary to common
> >>> implementation pattern. If the work can be broken down into smaller
> chunks
> >>> which need serialization separately then each chunk/work type can have
> a
> >>> different leader.
> >>> For question 3, ZK does support auth and encryption for client
> >>> connections but not for inter ZK node channels. Do you have
> requirement to
> >>> secure inter ZK nodes, can you let us know what your requirements are
> so we
> >>> can implement a solution to fit all needs?.
> >>> For question 4 the official implementation is C, people tend to wrap
> that
> >>> with C++ and there should projects that use ZK doing that you can look
> them
> >>> up and see if you can separate it out and use them.
> >>> Hope this helps.Powell.
> >>>
> >>>
> >>>
> >>>     On Thursday, December 31, 2015 8:07 AM, Edward Capriolo <
> >>> edward.capriolo@huffingtonpost.com> wrote:
> >>>
> >>>
> >>>  Q:What is the best way of handling distributed-lock expiry? The owner
> >>> of the lock managed to acquire it and may be in middle of some
> >>> computation when the session expires or lock expire
> >>>
> >>> If you are using Java a way I can see doing this is by using the
> >>> ExecutorCompletionService
> >>>
> >>>
> https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorCompletionService.html
> >>> .
> >>> It allows you to keep your workers in a group, You can poll the group
> and
> >>> provide cancel semantics as needed.
> >>> An example of that service is here:
> >>>
> >>>
> https://github.com/edwardcapriolo/nibiru/blob/master/src/main/java/io/teknek/nibiru/coordinator/EventualCoordinator.java
> >>> where I am issuing multiple reads and I want to abandon the process if
> >>> they
> >>> do not timeout in a while. Many async/promices frameworks do this by
> >>> launching two task ComputationTask and a TimeoutTask that returns in 10
> >>> seconds. Then they ask the completions service to poll. If the service
> is
> >>> given the TimoutTask after the timeout that means the Computation did
> not
> >>> finish in time.
> >>>
> >>> Do people generally take action in middle of the computation (abort it
> and
> >>> do itin a clever way such that effect appears atomic, so abort is
> >>> notreally
> >>> visible, if so what are some of those clever ways)?
> >>>
> >>> The base issue is java's synchronized/ AtomicReference do not have a
> >>> rollback.
> >>>
> >>> There are a few ways I know to work around this. Clojure has STM
> (software
> >>> Transactional Memory) such that if an exception is through inside a
> doSync
> >>> all of the stems inside the critical block never happened. This assumes
> >>> your using all clojure structures which you are probably not.
> >>> A way co workers have done this is as follows. Move your entire
> >>> transnational state into a SINGLE big object that you can
> >>> copy/mutate/compare and swap. You never need to rollback each piece
> >>> because
> >>> your changing the clone up until the point you commit it.
> >>> Writing reversal code is possible depending on the problem. There are
> >>> questions to ask like "what if the reversal somehow fails?"
> >>>
> >>>
> >>>
> >>>
> >>> On Thu, Dec 31, 2015 at 3:10 AM, singh.janmejay <
> singh.janmejay@gmail.com
> >>> >
> >>> wrote:
> >>>
> >>> > Hi,
> >>> >
> >>> > Was wondering if there are any reference designs, patterns on
> handling
> >>> > common operations involving distributed coordination.
> >>> >
> >>> > I have a few questions and I guess they must have been asked before,
> I
> >>> > am unsure what to search for to surface the right answers. It'll be
> >>> > really valuable if someone can provide links to relevant
> >>> > "best-practices guide" or "suggestions" per question or share some
> >>> > wisdom or ideas on patterns to do this in the best way.
> >>> >
> >>> > 1. What is the best way of handling distributed-lock expiry? The
> owner
> >>> > of the lock managed to acquire it and may be in middle of some
> >>> > computation when the session expires or lock expires. When it
> finishes
> >>> > that computation, it can tell that the lock expired, but do people
> >>> > generally take action in middle of the computation (abort it and do
> it
> >>> > in a clever way such that effect appears atomic, so abort is not
> >>> > really visible, if so what are some of those clever ways)? Or is the
> >>> > right thing to do, is to write reversal-code, such that operations
> can
> >>> > be cleanly undone in case the verification at the end of computation
> >>> > shows that lock expired? The later obviously is a lot harder to
> >>> > achieve.
> >>> >
> >>> > 2. Same as above for leader-election scenarios. Leader generally
> >>> > administers operations on data-systems that take significant time to
> >>> > complete and have significant resource overhead and RPC to administer
> >>> > such operations synchronously from leader to data-node can't be
> atomic
> >>> > and can't be made latency-resilient to such a degree that issuing
> >>> > operation across a large set of nodes on a cluster can be guaranteed
> >>> > to finish without leader-change. What do people generally do in such
> >>> > situations? How are timeouts for operations issued when operations
> are
> >>> > issued using sequential-znode as a per-datanode dedicated queue? How
> >>> > well does it scale, and what are some things to watch-out for
> >>> > (operation-size, encoding, clustering into one znode for atomicity
> >>> > etc)? Or how are atomic operations that need to be issued across
> >>> > multiple data-nodes managed (do they have to be clobbered into one
> >>> > znode)?
> >>> >
> >>> > 3. How do people secure zookeeper based services? Is
> >>> > client-certificate-verification the recommended way? How well does
> >>> > this work with C client? Is inter-zk-node communication done with
> >>> > X509-auth too?
> >>> >
> >>> > 4. What other projects, reference-implementations or libraries should
> >>> > I look at for working with C client?
> >>> >
> >>> > Most of what I have asked revolves around leader or lock-owner having
> >>> > a false-failure (where it doesn't know that coordinator thinks it has
> >>> > failed).
> >>> >
> >>> > --
> >>> > Regards,
> >>> > Janmejay
> >>> > http://codehunk.wordpress.com
> >>> >
> >>>
> >>>
> >>>
> >>>
> >>
> >>
>
>
>
> --
> Regards,
> Janmejay
> http://codehunk.wordpress.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message