incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandeep Tata <sandeep.t...@gmail.com>
Subject Re: Concurrent updates
Date Sat, 18 Jul 2009 17:01:16 GMT
You could (for now) store counters in
Table1.Standard1['cassandra']['frequency-mapperid'].
At the end, you do a get_slice and add them up.
This is really bad for fault-tolerance -- you'll get wrong counts if
mappers were restarted because of failures. But then, you'd have the
same problem if you (transactionally) incremented a single counter
too.
This way, modulo failures your answer is still correct.



On Fri, Jul 17, 2009 at 8:41 AM, Jonathan Ellis<jbellis@gmail.com> wrote:
> This is the kind of inconsistency that vector clocks can handle but
> the more simplistic timestamp-based resolution cannot.
>
> Of test-and-set vs vector clocks, vector clocks fits cassandra much better.
>
> -Jonathan
>
> On Fri, Jul 17, 2009 at 9:59 AM, Jun Rao<junrao@almaden.ibm.com> wrote:
>> This is a case where a test-and-set feature would be useful. See the
>> following JIRA. We just don't have it nailed down yet.
>> https://issues.apache.org/jira/browse/CASSANDRA-48
>>
>> Jun
>> IBM Almaden Research Center
>> K55/B1, 650 Harry Road, San Jose, CA 95120-6099
>>
>> junrao@almaden.ibm.com
>>
>> Ivan Chang <ivan.chang@medigy.com>
>>
>>
>> Ivan Chang <ivan.chang@medigy.com>
>>
>> 07/17/2009 07:14 AM
>>
>> Please respond to
>> cassandra-user@incubator.apache.org
>>
>> To
>> cassandra-user@incubator.apache.org
>> cc
>>
>> Subject
>> Concurrent updates
>> I have the following scenario that would like a best solution for.
>>
>> Here's the scenario:
>>
>> Table1.Standard1['cassandra']['frequency']
>>
>> it is used for keeping track of how many times the word "cassandra"
>> appeared.
>>
>> Let's say we have a bunch of articles stored in Hadoop, a Map/Reduce greps
>> all articles throughout the Hadoop cluster that matches the pattern
>> ^cassandra$
>> and updates Table1.Standard1['cassandra']['frequency'].  Hence
>> Table1.Standard1['cassandra']['frequency'] will be updated concurrently.
>>
>> One of the issues I am facing is that
>> Table1.Standard1['cassandra']['frequency']
>> stores the count as a String (I am using Java), so in order to update the
>> frequency
>> properly, the thread that's running the Map/Reduce will have to retrieve
>> Table1.Standard1['cassandra']['frequency'] in its native String format and
>> hold
>> that in temp (java Sttring), convert into int, then add the new counts in,
>> and finally
>> "SET Table1.Standard1['cassandra']['frequency']. =  '" + temp.toString() +
>> ''"
>>
>> During the entire process, how do we guranatee concurrency.  The Cql SET
>> does
>> not allow something like
>>
>> SET Table1.Standard1['cassandra']['frequency']. =
>> Table1.Standard1['cassandra']['frequency']. + newCounts
>>
>> since there's only one String type.
>>
>> What would be the best solution in this situtaion?
>>
>> Thanks,
>> Ivan
>>
>

Mime
View raw message