cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Haddad <>
Subject Re: Spark and intermediate results
Date Fri, 09 Oct 2015 14:34:41 GMT
You can run spark against your Cassandra data directly without using a
shared filesystem.

On Fri, Oct 9, 2015 at 6:09 AM Marcelo Valle (BLOOMBERG/ LONDON) <> wrote:

> Hello,
> I saw this nice link from an event:
> I would like to test using Spark to perform some operations on a column
> family, my objective is reading from CF A and writing the output of my M/R
> job to CF B.
> That said, I've read this from Spark's FAQ (
> "Do I need Hadoop to run Spark?
> No, but if you run on a cluster, you will need some form of shared file
> system (for example, NFS mounted at the same path on each node). If you
> have this type of filesystem, you can just deploy Spark in standalone mode.
> "
> The question I ask is - if I don't want to have a HDFS instalation just to
> run Spark on Cassandra, is my only option to have this NFS mounted over
> network?
> It doesn't seem smart to me to have something as NFS to store Spark files,
> as it would probably affect performance, and at the same time I wouldn't
> like to have an additional HDFS cluster just to run jobs on Cassandra.
> Is there a way of using Cassandra itself as this "some form of shared
> file system"?
> -Marcelo
> << ideas don't deserve respect >>

View raw message