cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Haddad <...@jonhaddad.com>
Subject Re: Spark and intermediate results
Date Fri, 09 Oct 2015 14:34:41 GMT
You can run spark against your Cassandra data directly without using a
shared filesystem.

https://github.com/datastax/spark-cassandra-connector


On Fri, Oct 9, 2015 at 6:09 AM Marcelo Valle (BLOOMBERG/ LONDON) <
mvallemilita@bloomberg.net> wrote:

> Hello,
>
> I saw this nice link from an event:
>
>
> http://www.datastax.com/dev/blog/zen-art-spark-maintenance?mkt_tok=3RkMMJWWfF9wsRogvqzIZKXonjHpfsX56%2B8uX6GylMI%2F0ER3fOvrPUfGjI4GTcdmI%2BSLDwEYGJlv6SgFSrXMMblswLgIXBY%3D
>
> I would like to test using Spark to perform some operations on a column
> family, my objective is reading from CF A and writing the output of my M/R
> job to CF B.
>
> That said, I've read this from Spark's FAQ (
> http://spark.apache.org/faq.html):
>
> "Do I need Hadoop to run Spark?
> No, but if you run on a cluster, you will need some form of shared file
> system (for example, NFS mounted at the same path on each node). If you
> have this type of filesystem, you can just deploy Spark in standalone mode.
> "
>
> The question I ask is - if I don't want to have a HDFS instalation just to
> run Spark on Cassandra, is my only option to have this NFS mounted over
> network?
> It doesn't seem smart to me to have something as NFS to store Spark files,
> as it would probably affect performance, and at the same time I wouldn't
> like to have an additional HDFS cluster just to run jobs on Cassandra.
> Is there a way of using Cassandra itself as this "some form of shared
> file system"?
>
> -Marcelo
>
>
> << ideas don't deserve respect >>
>

Mime
View raw message