cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Valle (BLOOMBERG/ LONDON)" <>
Subject Re: best supported spark connector for Cassandra
Date Fri, 13 Feb 2015 14:05:02 GMT
Actually, I am not the one looking for support, but I thank you a lot anyway.
But from your message I guess the answer is yes, Datastax is not the only Cassandra vendor
offering support and changing official Cassandra source at this moment, is this right?
Subject: Re: best supported spark connector for Cassandra

Of course, Stratio Deep and Stratio Cassandra are licensed  Apache 2.0.   

Regarding the Cassandra support, I can introduce you to someone in Stratio that can help you.

2015-02-12 15:05 GMT+01:00 Marcelo Valle (BLOOMBERG/ LONDON) <>:

Thanks for the hint Gaspar. 
Do you know if Stratio Deep / Stratio Cassandra are also licensed Apache 2.0?

I had interest in knowing more about Stratio when I was working on a start up. Now, on a blueship,
it seems one of the hardest obstacles to use Cassandra in a project is the need of an area
supporting it, and it seems people are specially concerned about how many vendors an open
source solution has to provide support. 

This seems to be kind of an advantage of HBase, as there are many vendors supporting it, but
I wonder if Stratio can be considered an alternative to Datastax reggarding Cassandra support?

It's not my call here to decide anything, but as part of the community it helps to have this
business scenario clear. I could say Cassandra could be the best fit technical solution for
some projects but sometimes non-technical factors are in the game, like this need for having
more than one vendor available...

Subject: Re: best supported spark connector for Cassandra

My suggestion is to use Java or Scala instead of Python. For Java/Scala both the Datastax
and Stratio drivers are valid and similar options. As far as I know they both take care about
data locality and are not based on the Hadoop interface. The advantage of Stratio Deep is
that allows you to integrate Spark not only with Cassandra but with MongoDB, Elasticsearch,
Aerospike and others as well. 
Stratio has a forked Cassandra for including some additional features such as Lucene based
secondary indexes. So Stratio driver works fine with the Apache Cassandra and also with their

You can find some examples of using Deep here:  Please
if you need some help with Stratio Deep do not hesitate to contact us.

2015-02-11 17:18 GMT+01:00 shahab <>:

I am using Calliope cassandra-spark connector(, which
is quite handy and easy to use!
The only problem is that it is a bit outdates , works with Spark 1.1.0, hopefully new version
comes soon.


On Wed, Feb 11, 2015 at 2:51 PM, Marcelo Valle (BLOOMBERG/ LONDON) <>

I just finished a scala course, nice exercise to check what I learned :D

Thanks for the answer!

Subject: Re: best supported spark connector for Cassandra

Start looking at the Spark/Cassandra connector here (in Scala):

Data locality is provided by this method:

Start digging from this all the way down the code.

As for Stratio Deep, I can't tell how the did the integration with Spark. Take some time to
dig down their code to understand the logic. 

On Wed, Feb 11, 2015 at 2:25 PM, Marcelo Valle (BLOOMBERG/ LONDON) <>

Taking the opportunity Spark was being discussed in another thread, I decided to start a new
one as I have interest in using Spark + Cassandra in the feature.

About 3 years ago, Spark was not an existing option and we tried to use hadoop to process
Cassandra data. My experience was horrible and we reached the conclusion it was faster to
develop an internal tool than insist on Hadoop _for our specific case_. 

How I can see Spark is starting to be known as a "better hadoop" and it seems market is going
this way now. I can also see I have many more options to decide how to integrate Cassandra
using the Spark RDD concept than using the ColumnFamilyInputFormat. 

I have found this java driver made by Datastax:

I also have found python Cassandra support on spark's repo, but it seems experimental yet:

Finally I have found stratio deep:
It seems Stratio guys have forked Cassandra also, I am still a little confused about it.

Question: which driver should I use, if I want to use Java? And which if I want to use python?

I think the way Spark can integrate to Cassandra makes all the difference in the world, from
my past experience, so I would like to know more about it, but I don't even know which source
code I should start looking...
I would like to integrate using python and or C++, but I wonder if it doesn't pay the way
to use the java driver instead.

Thanks in advance


Gaspar Muñoz 

Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // @stratiobd


Gaspar Muñoz 

Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // @stratiobd

View raw message