cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian O'Neill" <>
Subject Re: How to process new rows in parallel?
Date Fri, 03 Aug 2012 18:27:10 GMT
If you are deleting the messages after processing, it sounds like you
are using Cassandra as a work queue.

Here are some links for implementing a distributed queue in Cassandra:

There is a placeholder on the use cases wiki for this, but no info:

We were looking to do the same thing, but in the end decided to go with Kafka.
Given your throughput requirements, Kafka might be a good option for
you as well.


On Fri, Aug 3, 2012 at 2:18 PM, Philip Nelson
<> wrote:
> Hello,
> I am using a Column Family in Cassandra to store incoming messages, which arrive at a
high rate (100s of thousands per second). I then have a process wake up periodically to work
on those messages, and then delete them. I'd like to understand how I could have multiple
processes running, each pulling off a bunch of messages in parallel. It would be nice to be
able to add processes dynamically, and not have to explicitly assign message ranges to various
> Any suggestions on how to ensure that each process pulls off a different bunch of messages?
Any recommended design patterns? I was going to look at qsandra too, for inspiration. Would
this be worthwhile?
> If this was a relational database, I would have the processes lock the table (or perhaps
a row), set flags on a row indicating that it's being "processed", and then unlock. Processes
would choose messages by SELECTing on unflagged messages. I'm not sure how this might map
to Cassandra. I realise it may not. Even if I configure the cluster such that seting a flag
on a row requires all nodes to be written, two processes could still race setting that flag,
> I am open to the idea that it might help to store the messages in wide rows, if that
> Thanks,
> Philip

Brian ONeill
Lead Architect, Health Market Science (

View raw message