cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashic Mahtab <>
Subject RE: Data platform support
Date Tue, 17 May 2016 06:54:33 GMT
If Spark workers are installed on the same nodes as Cassandra nodes, then they can take advantage
of data locality, greatly reducing the amount of network IO in Spark jobs. If you use a seperate
/ Cloudera / Hortonworks / EMR cluster, you won't be able to benefit from this. Other than
the locality issue, you can run Spark jobs from external clusters just fine. I've used both
approaches, and for particular types of jobs, I've found a "custom" cluster with Spark Master(s)
+ n*[Spark Worker + Cassandra] to be very effective. 

Date: Tue, 10 May 2016 17:13:25 +0100
Subject: Re: Data platform support

I understand that spark supports hdfs and standalone modes.The recommendation from cassandra
is that spark should be installed in standalone mode in SMACK framework.
On 10 May 2016 at 16:24, Sruti S <> wrote:
Not sure what is meant.. Spark can access HDFS. Why is it in standalone mode? Please clarify.
On Tue, May 10, 2016 at 11:08 AM, Srini Sydney <> wrote:
I have a clarification based on your answer -
spark is installed as standalone mode (not hdfs) in SMACK framework. Our data lake is in hdfs
. How do we overcome this ?

  - cheers sreeni

On 10 May 2016, at 08:16, vincent gromakowski <> wrote:

Maybe a SMACK stack would be a better option for using spark with Cassandra...
Le 10 mai 2016 8:45 AM, "Srini Sydney" <> a écrit :
Thanks a lot..denise
On 10 May 2016 at 02:42, Denise Rogers <> wrote:
It really depends how close you want to stay to the most current versions of open source community

Cloudera has tended to build more products that requires their distribution to not be as current
with open source product versions.



Sent from mi iPhone

> On May 9, 2016, at 8:21 PM, Srini Sydney <> wrote:


> Hi guys


> We are thinking of using one the 3 big data platforms i.e hortonworks , mapr or cloudera
. Will use hadoop ,hive , zookeeper, and spark in these platforms.



> Which platform would be better suited for cassandra ?



> -  sreeni


View raw message