incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Nelson <philipomailbox-c...@yahoo.com>
Subject How to process new rows in parallel?
Date Fri, 03 Aug 2012 18:18:06 GMT
Hello,

I am using a Column Family in Cassandra to store incoming messages, which arrive at a high
rate (100s of thousands per second). I then have a process wake up periodically to work on
those messages, and then delete them. I'd like to understand how I could have multiple processes
running, each pulling off a bunch of messages in parallel. It would be nice to be able to
add processes dynamically, and not have to explicitly assign message ranges to various processes.

Any suggestions on how to ensure that each process pulls off a different bunch of messages?
Any recommended design patterns? I was going to look at qsandra too, for inspiration. Would
this be worthwhile?

If this was a relational database, I would have the processes lock the table (or perhaps a
row), set flags on a row indicating that it's being "processed", and then unlock. Processes
would choose messages by SELECTing on unflagged messages. I'm not sure how this might map
to Cassandra. I realise it may not. Even if I configure the cluster such that seting a flag
on a row requires all nodes to be written, two processes could still race setting that flag,
right?

I am open to the idea that it might help to store the messages in wide rows, if that helps.

Thanks,

Philip

Mime
View raw message