storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian O'Neill <>
Subject Re: hmsonline/storm-cassandra
Date Fri, 03 Jan 2014 21:03:42 GMT

Great input.  I agree.

For the same reasons, we¹ve been looking at developing another
implementation of CassandraState that is less restrictive, leveraging the
CQL java-driver from Datastax.  CQL has better support for collections, as
well as lightweight transactions.  We plan to use both.

If you aren¹t too far along, maybe it might make sense to use CQL (over

If you are interested in collaborating, shoot me a direct email.

bone AT alumni DOT brown DOT edu

all the best,

Brian O'Neill
Chief Architect
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <>   €

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.

From:  Laurent Thoulon <>
Reply-To:  <>
Date:  Friday, January 3, 2014 at 4:27 AM
To:  <>
Subject:  hmsonline/storm-cassandra


We've been using Cassandra in out topologies for some time now. When we
started, there was not CassandraState that suited our needs so we basically
reinvented the wheel based on an old CassandraState that used Hector.
What we implemented is for the CassandraMapState to use dynamic column
family names, rowkeys and column names and the ability to use Composites. By
dynamic i mean it can be fetched from the tuple.
It works nicely but we've been seeing some performance issues when scaling
and we're thinking it may be coming from hector's batch mutations.

I'm not going to go through all our thoughts but we also decided to rebuild
our topologies to make them smaller and with fewer goals so we can be able
to pinpoint the bottlenecks more easely.

Just so everything is said, we're using Trident.

Now, we're considering using Astyanax and so we thought it may be a good
idea to try and use hmsonline/storm-cassandra as it's part of storm's
contrib. We've successfully implemented a basic use case but we're now
facing some more complexe ones. Our main problem is that the
CassandraMapState seems to restrain us to a particuliar schema for the CFs :
keys beeing composites and column name, colum family and ttl are fixed in
the options. Those reason are the same kind that lead us in the first place
to refactor the CassandraMapState. We're actually surprised noone seems to
have had the same needs and we're thinking there may be a better approach to
what we want to do that we did not think of.

We have two kinds of topologies we're building:
- Topologies that stores counters in an opaque way in various column
families (for various grainularities) using rowkeys that can be composite or
not and dynamic column names (timestamps or composites made of ids and
timestamps depending on the current tuple)
- Topologies that stores in a non transactionnal way a hashmap of <column
name, column values> in a rowkey depending on the tuple.

Does anyone have the same needs ?
Would you have any advice on how to achieve our goals in the most efficient
way ?
Should we just use our own CassandraState and move it to Astyanax ?
We'd be glad to talk about this and share our knowledge with the community.

If you'd like to see what we've done with our homebrewed CassandraState, i
created this Gist:

Here's what a partitionPersist looks like:

                new Fields("timestamp", "e", "a", "c", "r", "count"),
                new CassandraMultiputUpdater(CfStats.CF, new Fields("a",
"c", "r", "e"), new Fields("timestamp"), new Fields("count"), CfStats.TTL)

And what a stateQuery looks like:

                new Fields("a", "c"),
                new CassandraMapGet(CfUser.CF, new Fields("a", "c")),
                new Fields("mapWithOneResult") // config in
getCassandraState sets a limit to 1 and a range on columns for CfUser.CF

Here's some versioning:
- Java 6
- Kafka 0.7
- Storm 0.9.0-wip16
- Cassandra 1.2.4
We're considering upgrading all those to 7 / 0.8 / 0.9 / 2.


View raw message