storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Mocanu <amoc...@verticalscope.com>
Subject RE: Cassandra bolt
Date Mon, 06 Jan 2014 20:02:39 GMT
Thanks again Brian!
You've been very helpful.

I will eventually migrate to CQL and make a CQL Cassandra batch counting bolt.

-A

From: Brian O'Neill [mailto:boneill42@gmail.com] On Behalf Of Brian O'Neill
Sent: January-06-14 10:43 AM
To: user@storm.incubator.apache.org
Subject: Re: Cassandra bolt


Astyanax is performing the increment using counter columns.

In storm-cassandra, the code for incrementing the column value is here:

AstyanaxClient.java:422
            mutation.withRow(columnFamily, rowKey)
                        .incrementCounterColumn(columnName, incrementAmount);

This uses the counter column mechanisms exposed by Astyanax.  For more information, go here:
https://github.com/Netflix/astyanax/wiki/Working-with-counter-columns

This should work, except for the caveats mentioned already.  Cassandra is addressing this
under: https://issues.apache.org/jira/browse/CASSANDRA-4775)

-brian

---
Brian O'Neill
Chief Architect
Health Market Science
The Science of Better Results
2700 Horizon Drive * King of Prussia, PA * 19406
M: 215.588.6024 * @boneill42<http://www.twitter.com/boneill42>  *
healthmarketscience.com

This information transmitted in this email message is for the intended recipient only and
may contain confidential and/or privileged material. If you received this email in error and
are not the intended recipient, or the person responsible to deliver it to the intended recipient,
please contact the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination, copying or other use
of, or taking any action in reliance upon, this information by persons or entities other than
the intended recipient is strictly prohibited.


From: Adrian Mocanu <amocanu@verticalscope.com<mailto:amocanu@verticalscope.com>>
Reply-To: <user@storm.incubator.apache.org<mailto:user@storm.incubator.apache.org>>
Date: Monday, January 6, 2014 at 10:21 AM
To: "user@storm.incubator.apache.org<mailto:user@storm.incubator.apache.org>" <user@storm.incubator.apache.org<mailto:user@storm.incubator.apache.org>>
Subject: RE: Cassandra bolt

Hi
I am actually looking into using CassandraCounterBatchingBolt but atm I'm not sure how Cassandra
handles these eventual consistency issues so I need to research that. The reason I mention
this issues is because I cannot find anywhere in the code where before a write there is a
read .. which bothers me .. maybe Cassandra does it w counter columns? IDK.

The issue I'm talking ab is updating the same counter consecutively, but faster than the updates
propagate to  other Cassandra nodes.

Example:
Say I have 3 cassandra nodes. The counters on each of these nodes are 0.
Node1:0, node2:0, node3:0

An increment comes: 5
5 -> Node1:0, node2:0, node3:0

Increment starts at node 5 - still needs to propagate to node1 and node3
Node1:0, node2:5, node3:0

In the meantime, another increment arrives before previous increment is propagated:
3 -> Node1:0, node2:5, node3:0

Assuming 3 starts at a different node than where 5 started we have:
Node1:3, node2:5, node3:0

Now if 3 gets propagated to the other nodes AS AN INCREMENT and not as a new value (and the
same for 5) then eventually they would all equal 8 and this is what I want.

If 3 overwrites 5 (because it has a later timestamp) this is problematic - not what I want.

Will see what the Cassandra group says... or if the creators of CassandraCounterBatchingBolt
is on this group please let me know :)

Thanks
Adrian


From: Vladi Feigin [mailto:vladif86@gmail.com]
Sent: January-04-14 2:00 AM
To: user@storm.incubator.apache.org<mailto:user@storm.incubator.apache.org>
Subject: Re: Cassandra bolt

Hi Adrian,

Why you don't use C* counters? Looks like your scenario fits for this. I think CassandraCounterBatchingBolt
provides  what you need
Vladi

On Fri, Jan 3, 2014 at 11:00 PM, Adrian Mocanu <amocanu@verticalscope.com<mailto:amocanu@verticalscope.com>>
wrote:
Happy New Year all!

I'm working on a solution for the following scenario: I have tuples coming to a cassandra
bolt. The tuples are of this form: TupleData(String name, Int count, Long time) Time field
is unique per batch only but not overall because some tuples may come in late but have the
same name and time but different count.

For example:
I can receive these tuples for the same time: (x1,3,1111), (x2,4,1111)
Then the bolt may receive (x1,5,1111)
After these are put in cassandra, column family x1 should have value 8 for time 1111 and column
family x2 should have value 4 for time 1111

Caching aside, cassandra bolt needs to check if there is a count already in the db for the
tuple with given name and time. If it does exist then retrieve, increment it with newly received
value, and update db exntry w the new value. (At this point I'm not sure if update or delete+reinsert
is speedier)
If no db entry exists, then add the new tuple.

I've looked at cassandra bolts code from https://github.com/hmsonline/storm-cassandra/tree/master/src/main/java/com/hmsonline/storm/cassandra/bolt
which is the same as cassandra bolt from storm-contrib.

There is a class CassandraCounterBatchingBolt, but after looking at it I don't believe it
does the look up in db first before saving the value to db, which leads me to believe that
this will not work.

What I'm looking for seems pretty basic and I wonder if there is a cassandra bolt to do db
lookup before updating db. Does such a bolt exist open-sourced?
Otherwise I'm thinking of building mine on top of CassandraBatchingBolt.

-Adrian



Mime
View raw message