zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "singh.janmejay" <singh.janme...@gmail.com>
Subject Re: Best-practice guides on coordination of operations in distributed systems (and some C client specific questions)
Date Tue, 12 Jan 2016 11:49:11 GMT

On Tue, Jan 5, 2016 at 2:30 PM, singh.janmejay <singh.janmejay@gmail.com> wrote:
> Thanks for the replies everyone, most of it was very useful.
> @Alexander: The section of chubby paper you pointed me to tries to
> address exactly what I was looking for. That clearly is one good way
> of doing it. Im also thinking of an alternative way and can use a
> review or some feedback on that.
> @Powel: About x509 auth on intra-cluster communication, I don't have a
> blocking need for it, as it can be achieved by setting up firewall
> rules to accept only from desired hosts. It may be a good-to-have
> thing though, because in cloud-based scenarios where IP addresses are
> re-used, a recycled IP can still talk to a secure zk-cluster unless
> config is changed to remove the old peer IP and replace it with the
> new one. Its clearly a corner-case though.
> Here is the approach Im thinking of:
> - Implement all operations(atleast master-triggered operations) on
> operand machines idempotently
> - Have master journal these operations to ZK before issuing RPC
> - In case master fails with some of these operations in flight, the
> newly elected master will need to read all issued (but not retired
> yet) operations and issue them again.
> - Existing master(before failure or after failure) can retry and
> retire operations according to whatever the retry policy and
> success-criterion is.
> Why am I thinking of this as opposed to going with chubby sequencer passing:
> - I need to implement idempotency regardless, because recovery-path
> involving master-death after successful execution of operation but
> before writing ack to coordination service requires it. So idempotent
> implementation complexity is here to stay.
> - I need to increase surface-area of the architecture which is exposed
> to coordination-service for sequencer validation. Which may bring
> verification RPC in data-plane in some cases.
> - The sequencer may expire after verification but before ack, in which
> case new master may not recognize the operation as consistent with its
> decisions (or previous decision path).
> Thoughts? Suggestions?
> On Sun, Jan 3, 2016 at 2:18 PM, Alexander Shraer <shralex@gmail.com> wrote:
>> regarding atomic multi-znode updates -- check out "multi" updates
>> <http://tdunning.blogspot.com/2011/06/tour-of-multi-update-for-zookeeper.html>
>> .
>> On Sat, Jan 2, 2016 at 10:45 PM, Alexander Shraer <shralex@gmail.com> wrote:
>>> for 1, see the chubby paper
>>> <http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf>,
>>> section 2.4.
>>> for 2, I'm not sure I fully understand the question, but essentially, ZK
>>> guarantees that even during failures
>>> consistency of updates is preserved. The user doesn't need to do anything
>>> in particular to guarantee this, even
>>> during leader failures. In such case, some suffix of operations executed
>>> by the leader may be lost if they weren't
>>> previously acked by a majority.However, none of these operations could
>>> have been visible
>>> to reads.
>>> On Fri, Jan 1, 2016 at 12:29 AM, powell molleti <
>>> powellm79@yahoo.com.invalid> wrote:
>>>> Hi Janmejay,
>>>> Regarding question 1, if a node takes a lock and the lock has timed-out
>>>> from system perspective then it can mean that other nodes are free to take
>>>> the lock and work on the resource. Hence the history could be well into the
>>>> future when the previous node discovers the time-out. The question of
>>>> rollback in the specific context depends on the implementation details, is
>>>> the lock holder updating some common area?, then there could be corruption
>>>> since other nodes are free to write in parallel to the first one?. In the
>>>> usual sense a time-out of lock held means the node which held the lock is
>>>> dead. It is upto the implementation to ensure this case and, using this
>>>> primitive, if there is a timeout which means other nodes are sure that no
>>>> one else is working on the resource and hence can move forward.
>>>> Question 2 seems to imply the assumption that leader has significant work
>>>> todo and leader change is quite common, which seems contrary to common
>>>> implementation pattern. If the work can be broken down into smaller chunks
>>>> which need serialization separately then each chunk/work type can have a
>>>> different leader.
>>>> For question 3, ZK does support auth and encryption for client
>>>> connections but not for inter ZK node channels. Do you have requirement to
>>>> secure inter ZK nodes, can you let us know what your requirements are so
>>>> can implement a solution to fit all needs?.
>>>> For question 4 the official implementation is C, people tend to wrap that
>>>> with C++ and there should projects that use ZK doing that you can look them
>>>> up and see if you can separate it out and use them.
>>>> Hope this helps.Powell.
>>>>     On Thursday, December 31, 2015 8:07 AM, Edward Capriolo <
>>>> edward.capriolo@huffingtonpost.com> wrote:
>>>>  Q:What is the best way of handling distributed-lock expiry? The owner
>>>> of the lock managed to acquire it and may be in middle of some
>>>> computation when the session expires or lock expire
>>>> If you are using Java a way I can see doing this is by using the
>>>> ExecutorCompletionService
>>>> https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorCompletionService.html
>>>> .
>>>> It allows you to keep your workers in a group, You can poll the group and
>>>> provide cancel semantics as needed.
>>>> An example of that service is here:
>>>> https://github.com/edwardcapriolo/nibiru/blob/master/src/main/java/io/teknek/nibiru/coordinator/EventualCoordinator.java
>>>> where I am issuing multiple reads and I want to abandon the process if
>>>> they
>>>> do not timeout in a while. Many async/promices frameworks do this by
>>>> launching two task ComputationTask and a TimeoutTask that returns in 10
>>>> seconds. Then they ask the completions service to poll. If the service is
>>>> given the TimoutTask after the timeout that means the Computation did not
>>>> finish in time.
>>>> Do people generally take action in middle of the computation (abort it and
>>>> do itin a clever way such that effect appears atomic, so abort is
>>>> notreally
>>>> visible, if so what are some of those clever ways)?
>>>> The base issue is java's synchronized/ AtomicReference do not have a
>>>> rollback.
>>>> There are a few ways I know to work around this. Clojure has STM (software
>>>> Transactional Memory) such that if an exception is through inside a doSync
>>>> all of the stems inside the critical block never happened. This assumes
>>>> your using all clojure structures which you are probably not.
>>>> A way co workers have done this is as follows. Move your entire
>>>> transnational state into a SINGLE big object that you can
>>>> copy/mutate/compare and swap. You never need to rollback each piece
>>>> because
>>>> your changing the clone up until the point you commit it.
>>>> Writing reversal code is possible depending on the problem. There are
>>>> questions to ask like "what if the reversal somehow fails?"
>>>> On Thu, Dec 31, 2015 at 3:10 AM, singh.janmejay <singh.janmejay@gmail.com
>>>> >
>>>> wrote:
>>>> > Hi,
>>>> >
>>>> > Was wondering if there are any reference designs, patterns on handling
>>>> > common operations involving distributed coordination.
>>>> >
>>>> > I have a few questions and I guess they must have been asked before,
>>>> > am unsure what to search for to surface the right answers. It'll be
>>>> > really valuable if someone can provide links to relevant
>>>> > "best-practices guide" or "suggestions" per question or share some
>>>> > wisdom or ideas on patterns to do this in the best way.
>>>> >
>>>> > 1. What is the best way of handling distributed-lock expiry? The owner
>>>> > of the lock managed to acquire it and may be in middle of some
>>>> > computation when the session expires or lock expires. When it finishes
>>>> > that computation, it can tell that the lock expired, but do people
>>>> > generally take action in middle of the computation (abort it and do
>>>> > in a clever way such that effect appears atomic, so abort is not
>>>> > really visible, if so what are some of those clever ways)? Or is the
>>>> > right thing to do, is to write reversal-code, such that operations can
>>>> > be cleanly undone in case the verification at the end of computation
>>>> > shows that lock expired? The later obviously is a lot harder to
>>>> > achieve.
>>>> >
>>>> > 2. Same as above for leader-election scenarios. Leader generally
>>>> > administers operations on data-systems that take significant time to
>>>> > complete and have significant resource overhead and RPC to administer
>>>> > such operations synchronously from leader to data-node can't be atomic
>>>> > and can't be made latency-resilient to such a degree that issuing
>>>> > operation across a large set of nodes on a cluster can be guaranteed
>>>> > to finish without leader-change. What do people generally do in such
>>>> > situations? How are timeouts for operations issued when operations are
>>>> > issued using sequential-znode as a per-datanode dedicated queue? How
>>>> > well does it scale, and what are some things to watch-out for
>>>> > (operation-size, encoding, clustering into one znode for atomicity
>>>> > etc)? Or how are atomic operations that need to be issued across
>>>> > multiple data-nodes managed (do they have to be clobbered into one
>>>> > znode)?
>>>> >
>>>> > 3. How do people secure zookeeper based services? Is
>>>> > client-certificate-verification the recommended way? How well does
>>>> > this work with C client? Is inter-zk-node communication done with
>>>> > X509-auth too?
>>>> >
>>>> > 4. What other projects, reference-implementations or libraries should
>>>> > I look at for working with C client?
>>>> >
>>>> > Most of what I have asked revolves around leader or lock-owner having
>>>> > a false-failure (where it doesn't know that coordinator thinks it has
>>>> > failed).
>>>> >
>>>> > --
>>>> > Regards,
>>>> > Janmejay
>>>> > http://codehunk.wordpress.com
>>>> >
> --
> Regards,
> Janmejay
> http://codehunk.wordpress.com


View raw message