This is a case where a test-and-set feature would be useful. See the following JIRA. We just don't have it nailed down yet.

IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA 95120-6099

Inactive hide details for Ivan Chang <>Ivan Chang <>

          Ivan Chang <>

          07/17/2009 07:14 AM
          Please respond to




Concurrent updates

I have the following scenario that would like a best solution for.
Here's the scenario:
it is used for keeping track of how many times the word "cassandra" appeared.
Let's say we have a bunch of articles stored in Hadoop, a Map/Reduce greps
all articles throughout the Hadoop cluster that matches the pattern ^cassandra$
and updates Table1.Standard1['cassandra']['frequency'].  Hence
Table1.Standard1['cassandra']['frequency'] will be updated concurrently.
One of the issues I am facing is that Table1.Standard1['cassandra']['frequency']
stores the count as a String (I am using Java), so in order to update the  frequency
properly, the thread that's running the Map/Reduce will have to retrieve
Table1.Standard1['cassandra']['frequency'] in its native String format and hold
that in temp (java Sttring), convert into int, then add the new counts in, and finally
"SET Table1.Standard1['cassandra']['frequency']. =  '" + temp.toString() + ''"
During the entire process, how do we guranatee concurrency.  The Cql SET does
not allow something like
SET Table1.Standard1['cassandra']['frequency']. = Table1.Standard1['cassandra']['frequency']. + newCounts
since there's only one String type.
What would be the best solution in this situtaion?