Return-Path: X-Original-To: apmail-storm-user-archive@minotaur.apache.org Delivered-To: apmail-storm-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2A16C10F83 for ; Fri, 10 Jan 2014 14:39:05 +0000 (UTC) Received: (qmail 21580 invoked by uid 500); 10 Jan 2014 14:38:21 -0000 Delivered-To: apmail-storm-user-archive@storm.apache.org Received: (qmail 21557 invoked by uid 500); 10 Jan 2014 14:38:20 -0000 Mailing-List: contact user-help@storm.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@storm.incubator.apache.org Delivered-To: mailing list user@storm.incubator.apache.org Received: (qmail 21533 invoked by uid 99); 10 Jan 2014 14:38:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Jan 2014 14:38:19 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of vladif86@gmail.com designates 209.85.219.47 as permitted sender) Received: from [209.85.219.47] (HELO mail-oa0-f47.google.com) (209.85.219.47) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Jan 2014 14:38:13 +0000 Received: by mail-oa0-f47.google.com with SMTP id i7so5069054oag.6 for ; Fri, 10 Jan 2014 06:37:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=r/frNqepwOPvhNnBfYyLb7QAK1Kkxi3lFH/6GabgLQY=; b=iRbJR09jzD3tS+e4iYzZRN9d+Sv7RzrYywfaB67EeA4ucQxnE/4WZOhBfnvYFXcZ/6 J9ZL7NnWWs4adTLvwI/bUhNcZ34SVYWwlqqBMgg+HH85cBw8sBh14Fz+lVuTVhL4OSJD 2kDJB3zMwq2T/Uqrzzn7yv9/tsqkAA7qStzVcMkJSxyXgxdeLCEM63jgmkfPRgk4ZgDm 6C3lf9hujkXaUKraydFDpvJXFcQXlNPJun9kRTZ+AARWmjzGt0gMqF+q3kwTmjX6AnIz VvQLx41PJNwz7w7izIU3SnPQMJzSXrZqO42uDo7v5a2jiev2OsTvAWZKlsMuv+4raRXB +wBg== MIME-Version: 1.0 X-Received: by 10.182.29.98 with SMTP id j2mr7979521obh.30.1389364672447; Fri, 10 Jan 2014 06:37:52 -0800 (PST) Received: by 10.182.240.110 with HTTP; Fri, 10 Jan 2014 06:37:52 -0800 (PST) In-Reply-To: References: <2088e8e6988047c1b55b80dbdc56d4d3@CO2PR07MB522.namprd07.prod.outlook.com> Date: Fri, 10 Jan 2014 16:37:52 +0200 Message-ID: Subject: Re: Cassandra bolt From: Vladi Feigin To: user@storm.incubator.apache.org Content-Type: multipart/alternative; boundary=001a11c2c484c3cd3f04ef9eab7a X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2c484c3cd3f04ef9eab7a Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi, If you use Cassandra counters, eventually you will have 8 value in all nodes. 3 will not override 5 or vice verse. Certainly it's going to happen eventually and during some time you could be possible seeing different values from different clients but finally it will be 8 Vladi On Mon, Jan 6, 2014 at 5:21 PM, Adrian Mocanu wr= ote: > Hi > > I am actually looking into using CassandraCounterBatchingBolt but atm I= =92m > not sure how Cassandra handles these eventual consistency issues so I nee= d > to research that. The reason I mention this issues is because I cannot fi= nd > anywhere in the code where before a write there is a read .. which bother= s > me .. maybe Cassandra does it w counter columns? IDK. > > > > The issue I=92m talking ab is updating the same counter consecutively, bu= t > faster than the updates propagate to other Cassandra nodes. > > > > Example: > > Say I have 3 cassandra nodes. The counters on each of these nodes are 0. > > Node1:0, node2:0, node3:0 > > > > An increment comes: 5 > > 5 -> Node1:0, node2:0, node3:0 > > > > Increment starts at node 5 =96 still needs to propagate to node1 and node= 3 > > Node1:0, node2:5, node3:0 > > > > In the meantime, another increment arrives before previous increment is > propagated: > > 3 -> Node1:0, node2:5, node3:0 > > > > Assuming 3 starts at a different node than where 5 started we have: > > Node1:3, node2:5, node3:0 > > > > Now if 3 gets propagated to the other nodes AS AN INCREMENT and not as a > new value (and the same for 5) then eventually they would all equal 8 and > this is what I want. > > > > If 3 overwrites 5 (because it has a later timestamp) this is problematic = =96 > not what I want. > > > > Will see what the Cassandra group says... or if the creators of > CassandraCounterBatchingBolt is on this group please let me know J > > > > Thanks > > Adrian > > > > > > *From:* Vladi Feigin [mailto:vladif86@gmail.com] > *Sent:* January-04-14 2:00 AM > > *To:* user@storm.incubator.apache.org > *Subject:* Re: Cassandra bolt > > > > Hi Adrian, > > > > Why you don't use C* counters? Looks like your scenario fits for this. I > think CassandraCounterBatchingBolt provides what you need > > Vladi > > > > On Fri, Jan 3, 2014 at 11:00 PM, Adrian Mocanu > wrote: > > Happy New Year all! > > > > I'm working on a solution for the following scenario: I have tuples comin= g > to a cassandra bolt. The tuples are of this form: TupleData(String name, > Int count, Long time) Time field is unique per batch only but not overall > because some tuples may come in late but have the same name and time but > different count. > > > > For example: > > I can receive these tuples for the same time: (x1,3,1111), (x2,4,1111) > > Then the bolt may receive (x1,5,1111) > > After these are put in cassandra, column family x1 should have value 8 fo= r > time 1111 and column family x2 should have value 4 for time 1111 > > > > Caching aside, cassandra bolt needs to check if there is a count already > in the db for the tuple with given name and time. If it does exist then > retrieve, increment it with newly received value, and update db exntry w > the new value. (At this point I'm not sure if update or delete+reinsert i= s > speedier) > > If no db entry exists, then add the new tuple. > > > > I've looked at cassandra bolts code from > https://github.com/hmsonline/storm-cassandra/tree/master/src/main/java/co= m/hmsonline/storm/cassandra/bolt > > which is the same as cassandra bolt from storm-contrib. > > > > There is a class CassandraCounterBatchingBolt, but after looking at it I > don't believe it does the look up in db first before saving the value to > db, which leads me to believe that this will not work. > > > > What I'm looking for seems pretty basic and I wonder if there is a > cassandra bolt to do db lookup before updating db. Does such a bolt exist > open-sourced? > > Otherwise I'm thinking of building mine on top of CassandraBatchingBolt. > > > > -Adrian > > > > > --001a11c2c484c3cd3f04ef9eab7a Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable
Hi,
If you use Cassandra counters, eventually you will= have 8 value in all nodes.=A0
3 will not override 5 or vice vers= e.=A0
Certainly it's going to happen eventually and during so= me time you could be possible seeing different values from different client= s but finally it will be 8
Vladi



On Mon, Jan 6, 2014 at 5:21 PM, Adrian Mocanu <amocanu@verticalscope.com> wrote:

Hi

I am actually looking int= o using CassandraCounterBatchingBolt but atm I=92m not sure how Cassandra handles these eventual consistency iss= ues so I need to research that. The reason I mention this issues is because= I cannot find anywhere in the code where before a write there is a read ..= which bothers me .. maybe Cassandra does it w counter columns? IDK.

=A0<= /p>

The issue I=92m talking a= b is updating the same counter consecutively, but faster than the updates p= ropagate to=A0 other Cassandra nodes.

=A0<= /p>

Example:

Say I have 3 cassandra no= des. The counters on each of these nodes are 0.

Node1:0, node2:0, node3:0=

=A0<= /p>

An increment comes: 5

5 -> Node1:0, node2:0,= node3:0

=A0<= /p>

Increment starts at node = 5 =96 still needs to propagate to node1 and node3

Node1:0, node2:5, node3:0=

=A0<= /p>

In the meantime, another = increment arrives before previous increment is propagated:

3 -> Node1:0, node2:5,= node3:0

=A0<= /p>

Assuming 3 starts at a di= fferent node than where 5 started we have:

Node1:3, node2:5, node3:0=

=A0<= /p>

Now if 3 gets propagated = to the other nodes AS AN INCREMENT and not as a new value (and the same for= 5) then eventually they would all equal 8 and this is what I want.

=A0<= /p>

If 3 overwrites 5 (becaus= e it has a later timestamp) this is problematic =96 not what I want.=

=A0<= /p>

Will see what the Cassand= ra group says... or if the creators of CassandraCounterBatchingBolt is on t= his group please let me know J

=A0<= /p>

Thanks

Adrian

=A0<= /p>

=A0<= /p>

From: Vladi Feigin [mailto:vladif86@gmail.com]
Sent: January-04-14 2:00 AM


To: user@storm.incubator.apache.org
Subject: Re: Cassandra bolt

=A0

Hi Adrian,

=A0

Why you don't use C* counters? Looks like your s= cenario fits for this. I think CassandraCounterBatchingBolt provides =A0wha= t you need

Vladi

=A0

On Fri, Jan 3, 2014 at 11:00 PM, Adrian Mocanu <<= a href=3D"mailto:amocanu@verticalscope.com" target=3D"_blank">amocanu@verti= calscope.com> wrote:

Happy New Year all!

=A0

I'm working on a solution for the following scen= ario: I have tuples coming to a cassandra bolt. The tuples are of this form= : TupleData(String name, Int count, Long time) Time field is unique per batch only but not overall because some tuples may come in l= ate but have the same name and time but different count.

=A0

For example:

I can receive these tuples for the same time: (x1,3,1111), (x2,4,1111)

Then the bolt may receive (x1,5,1111)

After these are put in cassandra, column family x1 should have value 8 for = time 1111 and column family x2 should have value 4 for time 1111<= /u>

=A0

Caching aside, cassandra bolt needs to check if ther= e is a count already in the db for the tuple with given name and time. If i= t does exist then retrieve, increment it with newly received value, and update db exntry w the new value. (At this point I'= ;m not sure if update or delete+reinsert is speedier)

If no db entry exists, then add the new tuple.

=A0

I've looked at cassandra bolts code from https://github.com/hmsonline/storm-cassandra/tree/master/src/main/java/com/= hmsonline/storm/cassandra/bolt

which is the same as cassandra bolt from storm-contr= ib.

=A0

There is a class CassandraCounterBatchingBolt, but a= fter looking at it I don't believe it does the look up in db first befo= re saving the value to db, which leads me to believe that this will not work.

=A0

What I'm looking for seems pretty basic and I wo= nder if there is a cassandra bolt to do db lookup before updating db. Does = such a bolt exist open-sourced?

Otherwise I'm thinking of building mine on top o= f CassandraBatchingBolt.

=A0

-Adrian<= /span>

=A0

=A0


--001a11c2c484c3cd3f04ef9eab7a--