From cassandra-user-return-250-apmail-incubator-cassandra-user-archive=incubator.apache.org@incubator.apache.org Fri Jul 17 15:23:35 2009 Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 3353 invoked from network); 17 Jul 2009 15:23:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Jul 2009 15:23:34 -0000 Received: (qmail 91118 invoked by uid 500); 17 Jul 2009 15:24:40 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 91093 invoked by uid 500); 17 Jul 2009 15:24:40 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 91084 invoked by uid 99); 17 Jul 2009 15:24:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jul 2009 15:24:39 +0000 X-ASF-Spam-Status: No, hits=-0.6 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,TVD_FW_GRAPHIC_NAME_MID X-Spam-Check-By: apache.org Received-SPF: unknown (athena.apache.org: error in processing during lookup of junrao@almaden.ibm.com) Received: from [32.97.182.141] (HELO e1.ny.us.ibm.com) (32.97.182.141) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jul 2009 15:24:29 +0000 Received: from d01relay03.pok.ibm.com (d01relay03.pok.ibm.com [9.56.227.235]) by e1.ny.us.ibm.com (8.14.3/8.13.1) with ESMTP id n6HEs5Us032539 for ; Fri, 17 Jul 2009 10:54:05 -0400 Received: from d01av05.pok.ibm.com (d01av05.pok.ibm.com [9.56.224.195]) by d01relay03.pok.ibm.com (8.13.8/8.13.8/NCO v9.2) with ESMTP id n6HExA6w161054 for ; Fri, 17 Jul 2009 10:59:10 -0400 Received: from d01av05.pok.ibm.com (loopback [127.0.0.1]) by d01av05.pok.ibm.com (8.14.3/8.13.1/NCO v10.0 AVout) with ESMTP id n6HExAkq004593 for ; Fri, 17 Jul 2009 10:59:10 -0400 Received: from d01ml604.pok.ibm.com (d01ml604.pok.ibm.com [9.56.227.90]) by d01av05.pok.ibm.com (8.14.3/8.13.1/NCO v10.0 AVin) with ESMTP id n6HExApl004584 for ; Fri, 17 Jul 2009 10:59:10 -0400 In-Reply-To: <70cd74010907170714l66127030t590f51c82045229d@mail.gmail.com> Subject: Re: Concurrent updates To: cassandra-user@incubator.apache.org X-Mailer: Lotus Notes Release 7.0 HF277 June 21, 2006 Message-ID: From: Jun Rao Date: Fri, 17 Jul 2009 07:59:09 -0700 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Release 8.5|December 05, 2008) at 07/17/2009 10:59:09 MIME-Version: 1.0 Content-type: multipart/related; Boundary="0__=07BBFF65DFC1A9788f9e8a93df938690918c07BBFF65DFC1A978" X-Virus-Checked: Checked by ClamAV on apache.org --0__=07BBFF65DFC1A9788f9e8a93df938690918c07BBFF65DFC1A978 Content-type: multipart/alternative; Boundary="1__=07BBFF65DFC1A9788f9e8a93df938690918c07BBFF65DFC1A978" --1__=07BBFF65DFC1A9788f9e8a93df938690918c07BBFF65DFC1A978 Content-type: text/plain; charset=ISO-8859-1 Content-transfer-encoding: quoted-printable This is a case where a test-and-set feature would be useful. See the following JIRA. We just don't have it nailed down yet. https://issues.apache.org/jira/browse/CASSANDRA-48 Jun IBM Almaden Research Center K55/B1, 650 Harry Road, San Jose, CA 95120-6099 junrao@almaden.ibm.com = Ivan Chang = = To cassandra-user@incubator.apache.= org 07/17/2009 07:14 = cc AM = Subj= ect Concurrent updates = Please respond to = cassandra-user@in = cubator.apache.or = g = = = I have the following scenario that would like a best solution for. Here's the scenario: Table1.Standard1['cassandra']['frequency'] it is used for keeping track of how many times the word "cassandra" appeared. Let's say we have a bunch of articles stored in Hadoop,=A0a Map/Reduce = greps all articles throughout the Hadoop cluster that matches the pattern ^cassandra$ and updates Table1.Standard1['cassandra']['frequency'].=A0 Hence Table1.Standard1['cassandra']['frequency'] will be updated concurrently= . One of the issues I am facing is that Table1.Standard1 ['cassandra']['frequency'] stores the count as a String (I am using Java), so in order to update t= he frequency properly, the thread that's running the Map/Reduce will have to retriev= e Table1.Standard1['cassandra']['frequency'] in its native String format = and hold that in temp (java Sttring), convert into int, then add the new counts = in, and finally "SET Table1.Standard1['cassandra']['frequency'].=A0=3D =A0'" + temp.toS= tring() + ''" During the entire process, how do we guranatee concurrency.=A0 The Cql = SET does not allow something like SET Table1.Standard1['cassandra']['frequency'].=A0=3D Table1.Standard1 ['cassandra']['frequency'].=A0+ newCounts since there's only one String type. What would be the best solution in this situtaion? Thanks, Ivan= --1__=07BBFF65DFC1A9788f9e8a93df938690918c07BBFF65DFC1A978 Content-type: text/html; charset=ISO-8859-1 Content-Disposition: inline Content-transfer-encoding: quoted-printable

This is a case where a test-and-set feature would be useful. See the= following JIRA. We just don't have it nailed down yet.
https://= issues.apache.org/jira/browse/CASSANDRA-48

Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA 95120-6099

junrao@almaden.ibm.com

3D"InactiveIvan Chang <ivan.chang@medigy.c= om>


=
          Ivan Chang <ivan.chang@medigy.com>=

          07/17/2009 07:14 AM
          Please respond to
          cassandra-user@incubator.apache.org

=
3D=
To
3D""
cassandra-user@incubator.apache.org
3D=
cc
3D""
3D=
Subject
3D""
Concurrent updates
3D=3D""

I have the following scenario that would like a best s= olution for.
=A0
Here's the scenario:
=A0
Table1.Standard1['cassandra']['frequency']
=A0
it is used for keeping track of how many times the wor= d "cassandra" appeared.
=A0
Let's say we have a bunch of articles stored in Hadoop= ,=A0a Map/Reduce greps
all articles throughout the Hadoop cluster that matche= s the pattern ^cassandra$
and updates Table1.Standard1['cassandra']['frequency']= .=A0 Hence
Table1.Standard1['cassandra']['frequency'] will be upd= ated concurrently.
=A0
One of the issues I am facing is that Table1.Standard1= ['cassandra']['frequency']
stores the count as a String (I am using Java), so in = order to update the=A0 frequency
properly, the thread that's running the Map/Reduce wil= l have to retrieve
Table1.Standard1['cassandra']['frequency'] in its nati= ve String format and hold
that in temp (java Sttring), convert into int, then ad= d the new counts in, and finally
"SET Table1.Standard1['cassandra']['frequency'].=A0= =3D =A0'" + temp.toString() + ''"
=A0
During the entire process, how do we guranatee concurr= ency.=A0 The Cql SET does
not allow something like
=A0
SET Table1.Standard1['cassandra']['frequency'].=A0=3D = Table1.Standard1['cassandra']['frequency'].=A0+ newCounts
=A0
since there's only one String type.
=A0
What would be the best solution in this situtaion?
=A0
Thanks,
Ivan
= --1__=07BBFF65DFC1A9788f9e8a93df938690918c07BBFF65DFC1A978-- --0__=07BBFF65DFC1A9788f9e8a93df938690918c07BBFF65DFC1A978 Content-type: image/gif; name="graycol.gif" Content-Disposition: inline; filename="graycol.gif" Content-ID: <1__=07BBFF65DFC1A9788f9e8a93df938@us.ibm.com> Content-transfer-encoding: base64 R0lGODlhEAAQAKECAMzMzAAAAP///wAAACH5BAEAAAIALAAAAAAQABAAAAIXlI+py+0PopwxUbpu ZRfKZ2zgSJbmSRYAIf4fT3B0aW1pemVkIGJ5IFVsZWFkIFNtYXJ0U2F2ZXIhAAA7 --0__=07BBFF65DFC1A9788f9e8a93df938690918c07BBFF65DFC1A978 Content-type: image/gif; name="pic27086.gif" Content-Disposition: inline; filename="pic27086.gif" Content-ID: <2__=07BBFF65DFC1A9788f9e8a93df938@us.ibm.com> Content-transfer-encoding: base64 R0lGODlhWABDALP/AAAAAK04Qf79/o+Gm7WuwlNObwoJFCsoSMDAwGFsmIuezf///wAAAAAAAAAA AAAAACH5BAEAAAgALAAAAABYAEMAQAT/EMlJq704682770RiFMRinqggEUNSHIchG0BCfHhOjAuh EDeUqTASLCbBhQrhG7xis2j0lssNDopE4jfIJhDaggI8YB1sZeZgLVA9YVCpnGagVjV171aRVrYR RghXcAGFhoUETwYxcXNyADJ3GlcSKGAwLwllVC1vjIUHBWsFilKQdI8GA5IcpApeJQt8L09lmgkH LZikoU5wjqcyAMMFrJIDPAKvCFletKSev1HBw8KrxtjZ2tvc3d5VyKtCKW3jfz4uMKmq3xu4N0nK BVoJQmx2LGVOmrqNjjJf2hHAQo/eDwJGTKhQMcgQEEAnEjFS98+RnW3smGkZU6ncCWav/4wYOnAI TihRL/4FEwbp28BXMMcoscQCVxlepL4IGDSCyJyVQOu0o7CjmLN50OZlqWmyFy5/6yBBuji0AxFR M00oQAqNIstqI6qKHUsWRAEAvagsmfUEAImyxgbmUpJk3IklNUtJOUAVLoUr1+wqDGTE4zk+T6FG uQb3SizBCwatiiUgCBN8vrz+zFjVyQ8FWkOlg4NQiZMB5QS8QO3mpOaKnL0Z2EKvNMSILEThKhCg zMKPVxYJh23qm9KNW7pArPynMqZDiErsTMqI+LRi3QAgkFUbXpuFKhSYZALd0O5RKa2z9EYKBbpb qxIKsjUPRgD7I2XYV6wyrOw92ykExP8NW4URhknC5dKGE4v4NENQj2jXjmfNgOZDaXb5glRmXQ33 YEWQYNcZFnrYcIQLNzyTFDQNkXIff0ExVlY4srziQk43inZgL4rwxxINMvpFFAz1KOODHiu+4aEw NEjFl5B3JIKWKF3k6I9bfUGp5ZZcdunll5IA4cuHvQQJ5gcsoCWOOUwgltIwAKRxJgbIkJAQZEq0 2YliZnpZZ4BH3CnYOXldOUOfQoYDqF1LFHbXCrO8xmRsfoXDXJ6ChjCAH3QlhJcT6VWE6FCkfCco CgrMFsROrIEX3o2whVjWDjoJccN3LdggSGXLCdLEgHr1lyU3O3QxhgohNKXJCWv8JQr/PDdaqd6w 2rj1inLiGeiCJoDspAoQlYE6QWLSECehcWIYxIQES6zhbn1iImTHEQyqJ4eIxJJoUBc+3CbBuwZE V5cJPPkIjFDdeEabQbd6WgICTxiiz0f5dBKquXF6k4senwEhYGnKEFJeGrxUZy8dB8gmAXI/sPvH ESfCwVt5hTgYiqQqtdRNHQIU1PJ33ZqmzgE90OwLaoJcnMop1WiMmgkPHQRIrwgFuNV90A3doNKT mrKIN07AnGcI9BQjhCBN4RfA1qIZnMqorJCogKfGQnxSCDilTVIA0yl5ciTovgLuBDKFUDE9aQcw 9SA+rjSNf9/M1gxrj6VwDTS0IUSElMzBfsj0NFXR2kwsV1A5IF1grLgLL/r1R40BZEnuBWgmQEyb jqRwSAt6bqMCOFkvKFN2GPPkUzIm/SCF8z8pVzpbjVnMsy0vOr1hw3SaSRUhpY09v0z0J1FnwzPl fmh+xl4WtR0zGu24I4KbMQm3lnVu2oNWxI9W/lcyzA+mCKF4DBikxb/+UWtOGRiFP8qEwAayIgIA Ow== --0__=07BBFF65DFC1A9788f9e8a93df938690918c07BBFF65DFC1A978 Content-type: image/gif; name="ecblank.gif" Content-Disposition: inline; filename="ecblank.gif" Content-ID: <3__=07BBFF65DFC1A9788f9e8a93df938@us.ibm.com> Content-transfer-encoding: base64 R0lGODlhEAABAIAAAAAAAP///yH5BAEAAAEALAAAAAAQAAEAAAIEjI8ZBQA7 --0__=07BBFF65DFC1A9788f9e8a93df938690918c07BBFF65DFC1A978--