airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Albertus Kelvin (Jira)" <j...@apache.org>
Subject [jira] [Assigned] (AIRFLOW-6212) SparkSubmitHook failed to execute spark-submit to standalone cluster
Date Tue, 10 Dec 2019 08:02:00 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Albertus Kelvin reassigned AIRFLOW-6212:
----------------------------------------

    Assignee:     (was: Albertus Kelvin)

> SparkSubmitHook failed to execute spark-submit to standalone cluster
> --------------------------------------------------------------------
>
>                 Key: AIRFLOW-6212
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6212
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: hooks, operators
>    Affects Versions: 1.10.6
>            Reporter: Albertus Kelvin
>            Priority: Trivial
>
> I was trying to submit a pyspark job with spark-submit using SparkSubmitOperator. I already
set up the master appropriately via environment variable (AIRFLOW_CONN_SPARK_DEFAULT). The
value was something like *spark://host:port*.
> However, an exception occurred: 
> {noformat}
> airflow.exceptions.AirflowException: Cannot execute: ['path/to/spark-submit', '--master',
'host:port', 'job.py']
> {noformat}
> Turns out that the master should have *spark://* preceding the host:port. I checked the
code and found that this wasn't handled.
> {code:python}
> conn = self.get_connection(self._conn_id)
> if conn.port:
>          conn_data['master'] = "{}:{}".format(conn.host, conn.port)
> else:
>          conn_data['master'] = conn.host
> {code}
> I think the protocol should be added like the following.
> {code:python}
> conn_data['master'] = "spark://{}:{}".format(conn.host, conn.port)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message