zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sirois <john.sir...@gmail.com>
Subject Re: Work Sharing with Consistent Hashing
Date Wed, 09 Sep 2015 20:49:14 GMT
You may have missed this thread:
http://mail-archives.apache.org/mod_mbox/zookeeper-user/201509.mbox/%3CCAF+A=7rhAMcGxewhPR0n+z5-RwVxSLxd1ghTTH7CP3ozMVgdRg@mail.gmail.com%3E

What you describe is implemented, perhaps in different in some details,
here: http://nirmataoss.github.io/workflow/

On Wed, Sep 9, 2015 at 2:29 PM, Simon <cocoa@gmx.ch> wrote:

> Hi
>
> I have been reading a lot about Zookeeper lately because we have the
> requirement to distribute our workflow engine (for performance and
> reliability reasons). Currently, we have a single instance. If it fails, no
> work is done anymore. One approach mentioned in this mailing list and
> online documents is a master - worker setup. The master is made reliable by
> using multiple instances and electing a leader. This setup might work
> unless there are too many events to be dispatched by a single master. I am
> a bit concerned about that as currently the work dispatcher does no IO. It
> is blazing fast. If the work dispatcher (scheduler) now has to communicate
> over the network it might get too slow. So I thought about a different
> solution and would like to hear what you think about it.
>
> Let’s say each workflow engine instance (referred as node from now on)
> registers an ephemeral znode under /nodes. Each node installs a children
> watch on /nodes. The nodes uses this information to populate a hashring.
> Each node is only responsible for workflow instances that map to their
> corresponding part of the hashring. Event notifications would then be
> dispatched to the correct node based on the same hashring. The persistence
> of workflow instances and events would still be done in a highly available
> database. Only notifications about events would be dispatched. Whenever an
> event notification is received a workflow instance has to be dispatched to
> a given worker thread within a node.
>
> The interesting cases happen if a new node is added to the cluster or if a
> workflow engine instance fails. Let’s talk about failures first. As
> processing a workflow instance takes some time we cannot simply switch over
> to a new instance right away. After all, the previous node might still be
> running (i.e. it did not crash, just failed to heartbeat). We would have to
> wait some time (how long?!) until we know that the failing node has dropped
> the work it was doing (it gets notified of the session expiration). The
> failing node cannot communicate with Zookeeper so it has no way of telling
> that it was done. We also cannot use the central database to find out what
> happened. If a node fails (e.g. due to hardware failure or network failure)
> it cannot update the database. The database will look the same regardless
> of whether it is working or has crashed. It must be impossible that two
> workflow engines work on the same workflow instance. This would result in
> duplicate messages being sent out to backend systems (not a good idea).
>
> The same issue arrises with adding a node. The new node has to wait until
> the other nodes have stopped working. Ok, I could use Zookeeper in this
> case to lock workflow instances. Only if the lock is available a node can
> start working on an workflow instance. Whenever a node is added it will
> look up pending workflow instances in the database (those that have open
> events).
>
> To summarize: instead of having a central master that coordinates all work
> I would like to divide the work “space” into segments by using consistent
> hashing. Each node is an equal peer and can operate freely within its
> assigned work “space”.
>
> What do you think about that setup? Is it something completely stupid?! If
> yes, let me know why. Has somebody done something similar successfully?
> What would I have to watch out for? What would I need to add to the basic
> setup mentioned above?
>
> Regards,
> Simon

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message