storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian O'Neill <b...@alumni.brown.edu>
Subject Re: Cassandra bolt
Date Fri, 03 Jan 2014 21:09:57 GMT

Adrian,

See the email I just sent out to Laurent.

We have the exact same use case, and we are evaluating the use of
lightweight transactions (available in C* 2.0) to accomplish what you
described without falling into all the traps involved in a read-before-write
counter update.

I think a CassandraState implementation built on top of CQL may suffice.
I can probably get something published out to github by Monday or Tuesday.

Are you in a position to use Trident?
Or are you using raw Storm?

-brian

---
Brian O'Neill
Chief Architect
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Adrian Mocanu <amocanu@verticalscope.com>
Reply-To:  <user@storm.incubator.apache.org>
Date:  Friday, January 3, 2014 at 4:00 PM
To:  "user@storm.incubator.apache.org" <user@storm.incubator.apache.org>
Subject:  Cassandra bolt

Happy New Year all!
 
I'm working on a solution for the following scenario: I have tuples coming
to a cassandra bolt. The tuples are of this form: TupleData(String name, Int
count, Long time) Time field is unique per batch only but not overall
because some tuples may come in late but have the same name and time but
different count. 
 
For example:
I can receive these tuples for the same time: (x1,3,1111), (x2,4,1111)
Then the bolt may receive (x1,5,1111)
After these are put in cassandra, column family x1 should have value 8 for
time 1111 and column family x2 should have value 4 for time 1111
 
Caching aside, cassandra bolt needs to check if there is a count already in
the db for the tuple with given name and time. If it does exist then
retrieve, increment it with newly received value, and update db exntry w the
new value. (At this point I'm not sure if update or delete+reinsert is
speedier)
If no db entry exists, then add the new tuple.
 
I've looked at cassandra bolts code from
https://github.com/hmsonline/storm-cassandra/tree/master/src/main/java/com/h
msonline/storm/cassandra/bolt
which is the same as cassandra bolt from storm-contrib.
 
There is a class CassandraCounterBatchingBolt, but after looking at it I
don't believe it does the look up in db first before saving the value to db,
which leads me to believe that this will not work.
 
What I'm looking for seems pretty basic and I wonder if there is a cassandra
bolt to do db lookup before updating db. Does such a bolt exist
open-sourced?
Otherwise I'm thinking of building mine on top of CassandraBatchingBolt.
 
-Adrian
 



Mime
View raw message