incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Greene <michael.gre...@gmail.com>
Subject Re: Concurrent updates
Date Fri, 17 Jul 2009 14:41:24 GMT
Even if CQL SET allowed for the operation you're describing, it's at
odds with the availability and consistency constrains of Cassandra.
Another process, somewhere else, could be reading and writing that
frequency value at the same time.  Reducing the operation to one
statement does not make it transactional or idempotent.

Unless you are looking for estimates in that cell and the delay
between processing updates to that cell is large enough to provide
reasonable estimates, you will want to look at a queueing solution or
a transaction solution outside of Cassandra.  There are a few issues
open in JIRA that would allow you to up the consistency on this
particular read/write call to ensure that you are getting better
estimates, but this is a scenario that Cassandra does not handle well.

If you can think of a way to model your operation to be idempotent,
then that would be preferable.  Otherwise an external queue (such as
AMQP) or transaction system (such as Zookeeper) is all I can think of
at the moment.

Michael

On Fri, Jul 17, 2009 at 9:14 AM, Ivan Chang<ivan.chang@medigy.com> wrote:
> I have the following scenario that would like a best solution for.
>
> Here's the scenario:
>
> Table1.Standard1['cassandra']['frequency']
>
> it is used for keeping track of how many times the word "cassandra"
> appeared.
>
> Let's say we have a bunch of articles stored in Hadoop, a Map/Reduce greps
> all articles throughout the Hadoop cluster that matches the pattern
> ^cassandra$
> and updates Table1.Standard1['cassandra']['frequency'].  Hence
> Table1.Standard1['cassandra']['frequency'] will be updated concurrently.
>
> One of the issues I am facing is that
> Table1.Standard1['cassandra']['frequency']
> stores the count as a String (I am using Java), so in order to update the
> frequency
> properly, the thread that's running the Map/Reduce will have to retrieve
> Table1.Standard1['cassandra']['frequency'] in its native String format and
> hold
> that in temp (java Sttring), convert into int, then add the new counts in,
> and finally
> "SET Table1.Standard1['cassandra']['frequency']. =  '" + temp.toString() +
> ''"
>
> During the entire process, how do we guranatee concurrency.  The Cql SET
> does
> not allow something like
>
> SET Table1.Standard1['cassandra']['frequency']. =
> Table1.Standard1['cassandra']['frequency']. + newCounts
>
> since there's only one String type.
>
> What would be the best solution in this situtaion?
>
> Thanks,
> Ivan

Mime
View raw message