curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Foolish Ewe <foolish...@hotmail.com>
Subject Re: Can Curator's recipes for synchronization be used when the releasing entity is not the locking entity?
Date Mon, 30 Jan 2017 21:33:40 GMT
Hello Jordan:


Thank you for your thoughtful reply and also thanks to Vitalii Tymchyshyn, whose response
may be addressing some of my questions.  Tl; dr  if I understand correctly, the Curator api
design constrains the client java process that unlocks or returns a lease to be the same client
(and hence in the same java process) that acquired the lock/lease.


Let's consider the problem and try to develop some intuition and if needed formalism. First
let's consider the problem outside the Curator context and then ask if we can express it in
Curator/Zookeeper.


Suppose we have the following logic before we decorate it with synchronziation/mutual exclusion,
we are given a collection of parallel workflows where they all do



Step B) update SharedResource

Step C) read SharedResource (and other inputs) and Write Computed Results (to HDFS)

Step E) ProcessResults


It happens that for our use case,  Step 2) takes considerable time in our use case and if
some work flow, say i is in Step B) or Step C) while another work flow, say j, does Step B),
then job i will either fail and stop (if we are lucky) or have (potentially undetectable)
corrupted output.

Thus we would like to employ to guard the critical section, which is Step A) and Step B) with
mutual exclusion/synchronization.   Let w denote the workflow id, then the revised job workflow
would seem to look like the following:

Step A) Acquire exclusive access to the Shared resource for workflow w (reserve/lock the shared
resource)

Step B) update SharedResource

Step C) read SharedResource (and other inputs) and Write Computed Results (to HDFS)

Step D) Release/unlock the reservation of the Shared Resource of workflow w making the Shared
Resource available for access by other workflows
Step E) ProcessResults

Since we aren't asking all the workflows to get to reach a particular point in execution,
it is unclear why I would try a synchronization barrier. To me,  this looks like a traditional
mutual exclusion problem (i.e. at most one workflow is active in the critical section of Step
B or Step C).

The twist in my use case is that Step B) and Step C) are collections of one or more different
jobs scheduled by Yarn,  so we don't currently support a continuously running client side
process that can host a listener for our use case.  I was looking to see if the off-the-shelf
recipes in Curator support this.  My current understanding is (if I understand Vitali's remarks
and the documentation) is that Curator's design assumes that locking entity should be in the
same Java process as the unlocking entity and that the Curator design advocates for a client
side process running with a listener for correctness (e.g. recovery in the case of client
failure, perhaps other cases too?).   But in our current system,  Step A) and Step D) are
different jobs and share no JVMs (i.e. are distinct Java processes) and I was looking for
an appropriate approach for the unlock/returnLease in Step D) given that constraint.

Please correct me if I'm wrong, but my understanding I looked at the following  candidate
approaches with the constraint of not having a continuously running java process that both
acquires and releases a lock (or acquires/returnLease a semaphore):


  *   Please correct me if I'm wrong, but my understanding is that for revocation, the lock
holder needs to be listening for revocation requests and then needs to release it's lock (or
Revocation appears to be cooperative, so I would need a client side listener in the locking
entity's java process, which would require some (potentially non-trivial) refactoring of the
workflow to accommodate this, in order to have correct revocation request detection followed
by lock release.
  *   http://curator.apache.org/curator-recipes/shared-reentrant-lock.html - The unlock mechanism
requires that the jvm has a valid InterProcessMutex that has already acquired the lock before
doing a release() operation. So we have a chicken and egg situation here.
  *   http://curator.apache.org/curator-recipes/shared-semaphore.html - The Lease (obtained
via the acquire method) parameter in the returnLease method (on first glance) appears to requires
that the same java process perform both the locking and unlocking (unless the lease can be
serialized and transmitted from the locking entity and received and deserialized by the unlocking
entity). Although the lease provides a way to mitigate crashed locking entities, there appears
to be a tradeoff, where the lease improves recovery from crashed or failed clients but makes
the Curator semaphores seem less expressive than the  traditional semaphore definition does
not have any analog of the lease. E.g. in producer consumer problems, the unlocking entity
is distinct from the locking entity (which is why I mentioned it as a motivating example).

This seems to imply that I need to look at the cost of modifying the workflow design and see
if I can meet the constraint or consider other approaches.

With best regards:

Bill

  *
Apache Curator Recipes<http://curator.apache.org/curator-recipes/shared-semaphore.html>
curator.apache.org
A counting semaphore that works across JVMs. All processes in all JVMs that use the same lock
path will achieve an inter-process limited set of leases.



Shared ReEntrant Lock - Apache Curator<http://curator.apache.org/curator-recipes/shared-reentrant-lock.html>
curator.apache.org
Fully distributed locks that are globally synchronous, meaning at any snapshot in time no
two clients think they hold the same lock.





________________________________
From: Jordan Zimmerman <jordan@jordanzimmerman.com>
Sent: Thursday, January 26, 2017 5:05 AM
To: user@curator.apache.org
Subject: Re: Can Curator's recipes for synchronization be used when the releasing entity is
not the locking entity?

I read the description several times and, sadly, don’t understand. Maybe someone else? At
first blush it almost sounds like a barrier or double barrier: http://curator.apache.org/curator-recipes/barrier.html
or http://curator.apache.org/curator-recipes/double-barrier.html. But, then, I don’t totally
understand. Another thing: Curator InterProcessMutex can be revoked from another process.
See http://curator.apache.org/curator-recipes/shared-reentrant-lock.html “Revoking” -
maybe that’s what you want? Other than that, maybe you can restate the problem or give more
details.
Apache Curator Recipes<http://curator.apache.org/curator-recipes/shared-reentrant-lock.html>
curator.apache.org
Fully distributed locks that are globally synchronous, meaning at any snapshot in time no
two clients think they hold the same lock.


Apache Curator Recipes<http://curator.apache.org/curator-recipes/double-barrier.html>
curator.apache.org
An implementation of the Distributed Double Barrier ZK recipe. Double barriers enable clients
to synchronize the beginning and the end of a computation.


Apache Curator Recipes<http://curator.apache.org/curator-recipes/barrier.html>
curator.apache.org
An implementation of the Distributed Barrier ZK recipe. Distributed systems use barriers to
block processing of a set of nodes until a condition is met at which time ...



-Jordan

On Jan 25, 2017, at 6:03 PM, Foolish Ewe <foolishewe@hotmail.com<mailto:foolishewe@hotmail.com>>
wrote:

Hello All:

I would like to use Curator to synchronize mutually exclusive access to a shared resource,
however the entity that wants to release a lock is distinct from the locking entity (i.e.
they are in different JVMS on different machines).    Such cases can occur in practice (e.g.
producer/consumer synchronization, but this isn't quite my use case).   Informally I would
like to have operations that behave like the following in a JVM based language:

  1.  Strict requirements:
     *   acquire(resourceId, taskId) - Have the task waiting for the resource suspend until
it has mutually exclusive access (i.e. acquires the lock) or throw an exception if the request
is somehow invalid (i.e. bad resource Id, bad task Id, internal error, etc).
     *   release(resourceId) - Given a resource, if there is an acquired lock, release that
lock and wake up the next task (in FCFS order) waiting to acquire the lock if it exists
  2.  Nice to have (useful for maintenance, etc).
     *   status(resourceId) - Report if the resource is locked, the current taskId of the
acquirer if the lock is acquired and the (potentially empty)  FCFS list of tasks waiting to
acquire the lock.
     *   releaseAll(resourceId)  - remove all pending locks on this resource

However, the semantics of the recipes I've looked at seem to indicate that the releasing entity
must have a handle (either explicit or implicit) of the lease/lock, e.g.


  *   http://curator.apache.org/curator-recipes/shared-reentrant-lock.html states
  *

public void release()
Perform one release of the mutex if the calling thread is the same thread that acquired it.
If the
thread had made multiple calls to acquire, the mutex will still be held when this method returns.



  *   http://curator.apache.org/curator-recipes/shared-semaphore.html states:
  *   Lease instances can either be closed directly or you can use these convenience methods:

public void returnAll(Collection<Lease> leases)
public void returnLease(Lease lease)

So it appears on the surface the the expectation is that the same entity that acquires a mutex
or a semaphore lease is expected to release the mutex or return the lease.
My questions are:

  1.  Am I misunderstanding how Curator works?
  2.  Is there a more appropriate abstraction in Curator for my use case?
  3.  Can I use one of the existing recipes?  Could a releasing entity return a lease if they
had a serialized copy of the lease but weren't the entity acquiring the lease?
  4.  If I need to roll my own, should the Curator Framework be able to help here or should
I work at the raw zookeeper level for this use case?

Thanks for your help with this:

Bill


Mime
View raw message