airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ash Berlin-Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-1184) Contrib Spark Submit Hook does not split argument and argument value
Date Tue, 20 Jun 2017 09:23:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055434#comment-16055434
] 

Ash Berlin-Taylor commented on AIRFLOW-1184:
--------------------------------------------

The PR associated with this issue needs (part) reverting. as this makes it _impossible_ to
include an argument with an embedded space. This is user error in how the SparkSubmitOperator
is being called.

{code}
application="com.example.MyClass"
args=[
  "--foo=bar",
 "--qux=foo bar",
]
task = SparkSubmitOperator(
    task_id="spark_csv_parser",
    dag=dag,
    application="com-example-spark.jar",
    application_args=args,
    java_class="com.example.MyClass",
)
{code}

Previously to this PR this would do the equivalent of:

{code}py
popen([
  "spark-submit",
  ...,
  "com-example-spark.jar",
  "com.example.MyClass",
  "--foo=bar",
  "--qux=foo bar"
])
{code}

But after this commit it drastically changes the last argument:

{code}
popen([
  "spark-submit",
  ...,
  "com-example-spark.jar",
  "com.example.MyClass",
  "--foo=bar",
  "--qux=foo",
  "bar"
])
{code}


The correct way to do the multiple arg from the tests is this:


{code}
            'application_args': [
                '-f', ' foo',
                '--bar', 'bar',
                '--start', '{{ macros.ds_add(ds, -1)}}',
                '--end', '{{ ds }}'
            ]
{code}

> Contrib Spark Submit Hook does not split argument and argument value
> --------------------------------------------------------------------
>
>                 Key: AIRFLOW-1184
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1184
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: contrib, hooks
>    Affects Versions: Airflow 2.0, Airflow 1.8
>            Reporter: Vianney FOUCAULT
>            Assignee: Vianney FOUCAULT
>             Fix For: Airflow 2.0, Airflow 1.8
>
>
> Python Popen expect a list as command. Spark submit too, as: 
> * ['--option value'] 
> is not the same as 
> * ['--option', 'value']
> in regards of spark. eg spark logs : (yarn logs)
> Error: Unknown option --end 2017-05-08
> Error: Unknown option --begin 2017-05-07
> Error: Unknown option --db_name mydb
> Error: Missing option --begin
> Error: Missing option --end
> Error: Missing option --db_name



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message