zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: queue with limit to number of simultaneous tasks
Date Tue, 14 Jul 2009 17:53:56 GMT
As you look at this, I would be grateful if you can evaluate alternative
implementations in which

a) each task is a separate file


b) all tasks are listed and described in a single file that is updated
atomically using standard ZK read-modify-write-repeat-on-failure style


c) all tasks are listed in a single file, but their descriptions are kept in
separate files whose names are in the single file.  Atomic updates occur to
the single file, task files are cleaned up as well as possible.  And task
files that are not deleted in good order (should be exceedingly rare) can be
recognized by lack of a reference from the single control file.

The trade-offs here occurs with large numbers of running tasks, large
numbers of pending tasks or very high task churn rates.  Option (a) becomes
very bad with many pending tasks because selecting a task may have server
round trips proportional to number of pending tasks.  Option (b) might
exceed the maximum file size for moderate number of tasks.  Option (c) seems
safe except for the occasional need for garbage cleanup if programs fail
between updating the control file and deleting the task files.  Mostly
people talk about (a), but (c) seems very competitive to me.

All of these alternatives simply implement the "look for" verb in Patrick's
excellent outline.  What he suggests for task working convention is quite

On Tue, Jul 14, 2009 at 9:45 AM, Patrick Hunt <phunt@apache.org> wrote:

> 1) your task processors look for the first available task
> 2) if found they create a ephemeral node as a child of the task node
>  (if the processor dies the ephemeral node will be removed)
> 3) the processor processes the task then deletes the task when "done"

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message