storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Oczkowski <Michael.Oczkow...@seeq.com>
Subject RE: Cassandra bolt
Date Fri, 03 Jan 2014 21:11:45 GMT
Brian, If you're considering using transactions you may want to read this.
http://aphyr.com/posts/294-call-me-maybe-cassandra/


From: Brian O'Neill [mailto:boneill42@gmail.com] On Behalf Of Brian O'Neill
Sent: Friday, January 3, 2014 2:10 PM
To: user@storm.incubator.apache.org
Subject: Re: Cassandra bolt


Adrian,

See the email I just sent out to Laurent.

We have the exact same use case, and we are evaluating the use of lightweight transactions
(available in C* 2.0) to accomplish what you described without falling into all the traps
involved in a read-before-write counter update.

I think a CassandraState implementation built on top of CQL may suffice.
I can probably get something published out to github by Monday or Tuesday.

Are you in a position to use Trident?
Or are you using raw Storm?

-brian

---
Brian O'Neill
Chief Architect
Health Market Science
The Science of Better Results
2700 Horizon Drive * King of Prussia, PA * 19406
M: 215.588.6024 * @boneill42<http://www.twitter.com/boneill42>  *
healthmarketscience.com

This information transmitted in this email message is for the intended recipient only and
may contain confidential and/or privileged material. If you received this email in error and
are not the intended recipient, or the person responsible to deliver it to the intended recipient,
please contact the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination, copying or other use
of, or taking any action in reliance upon, this information by persons or entities other than
the intended recipient is strictly prohibited.


From: Adrian Mocanu <amocanu@verticalscope.com<mailto:amocanu@verticalscope.com>>
Reply-To: <user@storm.incubator.apache.org<mailto:user@storm.incubator.apache.org>>
Date: Friday, January 3, 2014 at 4:00 PM
To: "user@storm.incubator.apache.org<mailto:user@storm.incubator.apache.org>" <user@storm.incubator.apache.org<mailto:user@storm.incubator.apache.org>>
Subject: Cassandra bolt

Happy New Year all!

I'm working on a solution for the following scenario: I have tuples coming to a cassandra
bolt. The tuples are of this form: TupleData(String name, Int count, Long time) Time field
is unique per batch only but not overall because some tuples may come in late but have the
same name and time but different count.

For example:
I can receive these tuples for the same time: (x1,3,1111), (x2,4,1111)
Then the bolt may receive (x1,5,1111)
After these are put in cassandra, column family x1 should have value 8 for time 1111 and column
family x2 should have value 4 for time 1111

Caching aside, cassandra bolt needs to check if there is a count already in the db for the
tuple with given name and time. If it does exist then retrieve, increment it with newly received
value, and update db exntry w the new value. (At this point I'm not sure if update or delete+reinsert
is speedier)
If no db entry exists, then add the new tuple.

I've looked at cassandra bolts code from https://github.com/hmsonline/storm-cassandra/tree/master/src/main/java/com/hmsonline/storm/cassandra/bolt
which is the same as cassandra bolt from storm-contrib.

There is a class CassandraCounterBatchingBolt, but after looking at it I don't believe it
does the look up in db first before saving the value to db, which leads me to believe that
this will not work.

What I'm looking for seems pretty basic and I wonder if there is a cassandra bolt to do db
lookup before updating db. Does such a bolt exist open-sourced?
Otherwise I'm thinking of building mine on top of CassandraBatchingBolt.

-Adrian


Mime
View raw message