hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Shao <zsh...@gmail.com>
Subject Using zookeeper to assign a bunch of long-running tasks to nodes (without unhandled tasks and double-handled tasks)
Date Sat, 23 Jan 2010 08:58:02 GMT
Let's say I have 100 long-running tasks and 20 nodes.
I want each of them to take up to 10 tasks. Each of the task should be
taken by one and only one node.

Will the following solution solve the problem?

Create a directory "/mytasks" in zookeeper.
Normally there will be 100 EPHEMERAL children in /mytasks directory,
named from "0" to "99".
The data of each will be the name of the node and the process id in
the node. This data is optional but allow us to do lookup from task to
node and process id.


Each node will start 10 processes.
  Each process will list the directory "/mytasks" with a watcher
  If trigger by the watcher, we relist the directory.
  If we found some missing files in the range of "0" to "99", we
create an EPHEMERAL node with no-overwrite option
    if the creation is successful, then we disable the watcher and
start processing the corresponding task (if something goes wrong, just
kill itself and the node will be gone)
    if not, we go back to wait for watcher.

Will this work?



-- 
Yours,
Zheng

Mime
View raw message