crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Crunch Spark with YARN cluster Manager
Date Mon, 23 Jun 2014 00:00:31 GMT
Yes please-- thanks Christian!


On Sun, Jun 22, 2014 at 4:35 PM, Christian Tzolov <
christian.tzolov@gmail.com> wrote:

> Hi Josh,
>
> After applying the  https://issues.apache.org/jira/browse/CRUNCH-410 patch
> i've manged to submit Crunch-Spark pipeline to Hadoop 2.2.0 cluster using
> the YARN manager :)
>
> The run configuration looks like this.
>
> export HADOOP_CONF_DIR=<your hadoop conf dir>
> export SPARK_SUBMIT_CLASSPATH=./commons-codec-1.4.jar:<your spark
> installation folder>/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:./<your
> application>-jar-with-dependencies.jar
> <your spark installation folder>/bin/spark-submit --num-executors 10
> --master yarn-client --class <crunch pipeline main class> ./<your
> application>-jar-with-dependencies.jar <your application arguments>
>
> (Note the commons-codec in the spark classpath!)
>
> I've noticed that the Crunch-spark runtime doesn't implement the
> Converter#applyPTypeTransforms logic so i've put together a patch
> (attached) to make my custom sources work.
>
> Shall I open a ticket and try to provide a complete patch  the
> Converter#applyPTypeTransforms?
>
> Cheers,
> Christian
>
>
>
> On Wed, Jun 18, 2014 at 1:32 PM, Christian Tzolov <
> christian.tzolov@gmail.com> wrote:
>
>> Hi Josh,
>>
>> Thanks for the references. I've applied the patch and started
>> experimenting with the crunch-spark on yarn. Paying around the yarn-client,
>> yarn-cluster master configuration. Not there yet.
>>
>> Cheers,
>> Christian
>>
>>
>>
>>
>> On Tue, Jun 17, 2014 at 5:09 PM, Josh Wills <jwills@cloudera.com> wrote:
>>
>>> Hey Christian,
>>>
>>> I posted an example to my local github repo (word count, of course) of
>>> running Spark 0.9.0 on a cluster, but it's pre-yarn:
>>>
>>> https://github.com/jwills/crunch-demo/tree/spark
>>>
>>> Use the spark-run.sh script to run it; you need to set -Dspark.master at
>>> the commandline to point at the spark master on the cluster. It would be
>>> cool to integrate it with the instructions here for running Spark under
>>> YARN and see how it came out:
>>>
>>> http://spark.apache.org/docs/latest/running-on-yarn.html
>>>
>>> Of course, we'd need to commit that patch to upgrade Crunch to Spark
>>> 1.0.0: https://issues.apache.org/jira/browse/CRUNCH-410
>>>
>>> J
>>>
>>>
>>> On Tue, Jun 17, 2014 at 7:47 AM, Christian Tzolov <
>>> christian.tzolov@gmail.com> wrote:
>>>
>>>> Is there an example of Crunch Spark pipeline for hadoop2/yarn cluster
>>>> manager?
>>>>
>>>
>>>
>>>
>>> --
>>> Director of Data Science
>>> Cloudera <http://www.cloudera.com>
>>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>>>
>>
>>
>


-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message