crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Crunch Spark with YARN cluster Manager
Date Tue, 17 Jun 2014 15:09:19 GMT
Hey Christian,

I posted an example to my local github repo (word count, of course) of
running Spark 0.9.0 on a cluster, but it's pre-yarn:

https://github.com/jwills/crunch-demo/tree/spark

Use the spark-run.sh script to run it; you need to set -Dspark.master at
the commandline to point at the spark master on the cluster. It would be
cool to integrate it with the instructions here for running Spark under
YARN and see how it came out:

http://spark.apache.org/docs/latest/running-on-yarn.html

Of course, we'd need to commit that patch to upgrade Crunch to Spark 1.0.0:
https://issues.apache.org/jira/browse/CRUNCH-410

J


On Tue, Jun 17, 2014 at 7:47 AM, Christian Tzolov <
christian.tzolov@gmail.com> wrote:

> Is there an example of Crunch Spark pipeline for hadoop2/yarn cluster
> manager?
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message