hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Junqueira <...@yahoo-inc.com>
Subject Re: Distributed queue: how to ensure no lost items?
Date Thu, 08 Jan 2009 16:06:27 GMT
You can't simply leave an element in the queue until a consumer  
finishes processing it, otherwise multiple consumers may end up  
processing it. What about the following:

- Use a failure detector to detect which consumers are up;
- Before removing an element from the queue, a consumer creates a  
znode (can't be ephemeral) flagging that the consumer is processing  
that element, that I'll call "/processing/pr";
- Once the consumer is done with the element, it removes "/processing/ 
- There is a watchdog client that periodically verifies if the list of  
elements being processed contains any element being processed by a  
crashed consumer. In that case, this client places the element back in  
the queue. This process needs to be careful not to generate duplicates  
as it is possible that the consumer crashes before it has time to  
delete the element from the queue and after it has created "/ 

I'm not sure if there is a more efficient solution because now you  
need to implement a failure detection mechanism through zookeeper (or  
externally if you prefer) and an extra watchdog process.


On Jan 8, 2009, at 4:15 PM, Stuart White wrote:

> I'm interested in using ZooKeeper to provide a distributed
> producer/consumer queue for my distributed application.
> Of course I've been studying the recipes provided for queues,  
> barriers, etc...
> My question is: how can I prevent packets of work from being lost if a
> process crashes?
> For example, following the distributed queue recipe, when a consumer
> takes an item from the queue, it removes the first "item" znode under
> the "queue" znode.  But, if the consumer immediately crashes after
> removing the item from the queue, that item is lost.
> Is there a recipe or recommended approach to ensure that no queue
> items are lost in the event of process failure?
> Thanks!

View raw message