hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: queue with limit to number of simultaneous tasks
Date Tue, 14 Jul 2009 16:45:59 GMT
It's hard to say, there are a number of variables. Some things to think 
about: Are the tasks idempotent? do they have leases (like SQS)? Is one 
process responsible for processing the tasks or will you have many vying 
for the jobs? Are the tasks ordered by creation date, or weighted by 
some factor? If processing for a task fails should another processor 
start processing, or drop the task, or move the task to a failed list? 
(to guard against totally blocking processing if 2 tasks are continually 
failing due to say, an error in the processing code). etc...

A simple approach might be to have a single queue of tasks:
http://hadoop.apache.org/zookeeper/docs/current/recipes.html#sc_recipes_Queues

where:
1) your task processors look for the first available task
2) if found they create a ephemeral node as a child of the task node
   (if the processor dies the ephemeral node will be removed)
3) the processor processes the task then deletes the task when "done"

the ephemeral created in 2) indicates whether a task is available or not

processors set watches on un-available tasks (the watch is on the 
ephemeral), and re-run 1) when the watch eventually triggers
(hint, you have to use exists("task/child", true) for the available check)

Obv if 3 is partially successful (ie you process the task and update, 
but fail before deleting the task node) then non-idempotence is going to 
be an issue. There are probably other considerations as well as the 
short list I gave above.


This sounds like a useful recipe to include in src/recipes if you are in 
a position to contribute back.

Regards,

Patrick

Alexander Sibiryakov wrote:
> Hello everybody!
> Please give me advice about designing application on zookeeper. I'd like 
> to implement queue with limit on number of simultaneous tasks. For 
> example I have 10 tasks, and I can process only 2 tasks simultaneously. 
> When one task is finished processing, system should start another, 
> supporting the number of task being in processing state within 2. Thanks.
> 
> 

Mime
View raw message