cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian O'Neill" <b...@alumni.brown.edu>
Subject Re: How to process new rows in parallel?
Date Fri, 03 Aug 2012 18:27:10 GMT
If you are deleting the messages after processing, it sounds like you
are using Cassandra as a work queue.

Here are some links for implementing a distributed queue in Cassandra:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Distributed-work-queues-td5226248.html
http://comments.gmane.org/gmane.comp.db.cassandra.user/16633

There is a placeholder on the use cases wiki for this, but no info:
http://wiki.apache.org/cassandra/UseCases#A_distributed_Priority_Job_Queue

We were looking to do the same thing, but in the end decided to go with Kafka.
Given your throughput requirements, Kafka might be a good option for
you as well.

-brian


On Fri, Aug 3, 2012 at 2:18 PM, Philip Nelson
<philipomailbox-cass@yahoo.com> wrote:
> Hello,
>
> I am using a Column Family in Cassandra to store incoming messages, which arrive at a
high rate (100s of thousands per second). I then have a process wake up periodically to work
on those messages, and then delete them. I'd like to understand how I could have multiple
processes running, each pulling off a bunch of messages in parallel. It would be nice to be
able to add processes dynamically, and not have to explicitly assign message ranges to various
processes.
>
> Any suggestions on how to ensure that each process pulls off a different bunch of messages?
Any recommended design patterns? I was going to look at qsandra too, for inspiration. Would
this be worthwhile?
>
> If this was a relational database, I would have the processes lock the table (or perhaps
a row), set flags on a row indicating that it's being "processed", and then unlock. Processes
would choose messages by SELECTing on unflagged messages. I'm not sure how this might map
to Cassandra. I realise it may not. Even if I configure the cluster such that seting a flag
on a row requires all nodes to be written, two processes could still race setting that flag,
right?
>
> I am open to the idea that it might help to store the messages in wide rows, if that
helps.
>
> Thanks,
>
> Philip



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/

Mime
View raw message