cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Miklas <and...@pagerduty.com>
Subject Distributed work-queues?
Date Sat, 26 Jun 2010 20:56:50 GMT
Hi all,

Has anyone written a work-queue implementation using Cassandra?

There's a section in the UseCase wiki page for "A distributed Priority  
Job Queue" which looks perfect, but unfortunately it hasn't been  
filled in yet.
http://wiki.apache.org/cassandra/UseCases#A_distributed_Priority_Job_Queue

I've been thinking about how best to do this, but every solution I've  
thought of seems to have some serious drawback.  The "range ghost"  
problem in particular creates some issues.  I'm assuming each job has  
a row within some column family, where the row's key is the time at  
which the job should be run.  To find the next job, you'd do a range  
query with a start a few hours in the past, and an end at the current  
time.  Once a job is completed, you delete the row.

The problem here is that you have to scan through deleted-but-not-yet- 
GCed rows each time you run the query.  Is there a better way?

Preventing more than one worker from starting the same job seems like  
it would be a problem too.  You'd either need an external locking  
manager, or have to use some other protocol where workers write their  
ID into the row and then immediately read it back to confirm that they  
are the owner of the job.

Any ideas here?  Has anyone come up with a nice implementation?  Is  
Cassandra not well suited for queue-like tasks?



Thanks,


Andrew

Mime
View raw message