cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: cassandra + spark / pyspark
Date Wed, 10 Sep 2014 16:13:22 GMT
Hello Oleg

Question 2: yes. The official spark cassandra connector can be found here:
https://github.com/datastax/spark-cassandra-connector

There is docs in the doc/ folder. You can read & write directly from/to
Cassandra without EVER using HDFS. You still need a resource manager like
Apache Mesos though to have high availability of your Spark cluster, on run
in stand alone mode and manage fail over yourself, choice is yours

Question 3: yes, you can save a massive amount of data into Cassandra

Question 4: I've played a little bit with it, it's quite smart, data
locality is guaranteed by creating Spark RDD partition mapping directly to
Cassandra node having the primary partition range. I have still not played
with it into production though so I can't tell anything about stability.

 Maybe other guys on the list may give their thoughts about it ?

Regards

Duy Hai DOAN



Le 10 sept. 2014 17:35, "Oleg Ruchovets" <oruchovets@gmail.com> a écrit :

> Hi ,
>   I try to evaluate different option of spark + cassandra and I have
> couple of questions:
>   My aim is to use cassandra+spark  without hadoop:
>
> 1) Is it possible to use only cassandra as input/output parameter for
> PySpark?
>   2) In case I'll use Spark (java,scala) is it possible to use only
> cassandra - input/output without hadoop?
>   3) I know there are couple of strategies for storage level, in case my
> data set is quite big and I have no enough memory to process - can I use
> DISK_ONLY option without hadoop (having only cassandra)?
> 4) please share your experience how stable cassandra + spark integration?
>
> Thanks
> Oleg
>

Mime
View raw message