Brian, If you’re considering using transactions you may want to read this.

http://aphyr.com/posts/294-call-me-maybe-cassandra/

 

 

From: Brian O'Neill [mailto:boneill42@gmail.com] On Behalf Of Brian O'Neill
Sent: Friday, January 3, 2014 2:10 PM
To: user@storm.incubator.apache.org
Subject: Re: Cassandra bolt

 

 

Adrian,

 

See the email I just sent out to Laurent.  

 

We have the exact same use case, and we are evaluating the use of lightweight transactions (available in C* 2.0) to accomplish what you described without falling into all the traps involved in a read-before-write counter update.

 

I think a CassandraState implementation built on top of CQL may suffice.

I can probably get something published out to github by Monday or Tuesday.

 

Are you in a position to use Trident?  

Or are you using raw Storm?

 

-brian

 

---

Brian O'Neill

Chief Architect

Health Market Science

The Science of Better Results

2700 Horizon Drive  King of Prussia, PA  19406

M: 215.588.6024 @boneill42    

healthmarketscience.com

 

This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.

 

 

From: Adrian Mocanu <amocanu@verticalscope.com>
Reply-To: <user@storm.incubator.apache.org>
Date: Friday, January 3, 2014 at 4:00 PM
To: "user@storm.incubator.apache.org" <user@storm.incubator.apache.org>
Subject: Cassandra bolt

 

Happy New Year all!

 

I'm working on a solution for the following scenario: I have tuples coming to a cassandra bolt. The tuples are of this form: TupleData(String name, Int count, Long time) Time field is unique per batch only but not overall because some tuples may come in late but have the same name and time but different count.

 

For example:

I can receive these tuples for the same time: (x1,3,1111), (x2,4,1111)

Then the bolt may receive (x1,5,1111)

After these are put in cassandra, column family x1 should have value 8 for time 1111 and column family x2 should have value 4 for time 1111

 

Caching aside, cassandra bolt needs to check if there is a count already in the db for the tuple with given name and time. If it does exist then retrieve, increment it with newly received value, and update db exntry w the new value. (At this point I'm not sure if update or delete+reinsert is speedier)

If no db entry exists, then add the new tuple.

 

I've looked at cassandra bolts code from https://github.com/hmsonline/storm-cassandra/tree/master/src/main/java/com/hmsonline/storm/cassandra/bolt

which is the same as cassandra bolt from storm-contrib.

 

There is a class CassandraCounterBatchingBolt, but after looking at it I don't believe it does the look up in db first before saving the value to db, which leads me to believe that this will not work.

 

What I'm looking for seems pretty basic and I wonder if there is a cassandra bolt to do db lookup before updating db. Does such a bolt exist open-sourced?

Otherwise I'm thinking of building mine on top of CassandraBatchingBolt.

 

-Adrian