beam-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Beam JIRA Bot (Jira)" <>
Subject [jira] [Commented] (BEAM-11378) Cannot run Python PortableRunner on EMR cluster
Date Sun, 14 Feb 2021 17:16:02 GMT


Beam JIRA Bot commented on BEAM-11378:

This issue was marked "stale-P2" and has not received a public comment in 14 days. It is now
automatically moved to P3. If you are still affected by it, you can comment and move it back
to P2.

> Cannot run Python PortableRunner on EMR cluster
> -----------------------------------------------
>                 Key: BEAM-11378
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>            Reporter: Ratul Ray
>            Priority: P3
> I have been trying to run the python word-count example on an [AWS EMR|] cluster.
And it does not work.
> Things I have tried:
>  * Running with 
> {code:bash}
> python3 py_codes/ --output word_count_output --runner=SparkRunner
> {code}
> This results in implicitly running with {{--spark-master-url local[4]}} which defeats
the purpose of running it in a cluster
>  * Tried
> {code:bash}
> python3 py_codes/ --output word_count_output --runner=SparkRunner --spark-master-url=yarn
> {code}
> Still uses local master.
>  * Could not use method described in [] under
"Running on a pre-deployed Spark cluster" because in yarn master is not exposed with an URL
like localhost:7077
>  * Tried
> {code:bash}
> python3 py_codes/ --output word_count_output --runner=SparkRunner --output_executable_path=jars/beam_word_count.jar
> {code}
> as described in
>  It can create a jar file, but when I submit the jar with spark-submit I get docker permission
denied exception. Possibly related to
> *So, no way to run a python beam code in a yarn spark cluster?*
>  This also means no way to run TFX code (which uses beam) in a yarn cluster.

This message was sent by Atlassian Jira

View raw message