hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-9160) [Submarine] Document "PYTHONPATH" environment variable setting when using -localization options
Date Sun, 06 Jan 2019 19:18:00 GMT

     [ https://issues.apache.org/jira/browse/YARN-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Wangda Tan updated YARN-9160:
    Fix Version/s: 3.2.1

> [Submarine] Document "PYTHONPATH" environment variable setting when using -localization
> -----------------------------------------------------------------------------------------------
>                 Key: YARN-9160
>                 URL: https://issues.apache.org/jira/browse/YARN-9160
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Zhankun Tang
>            Assignee: Zhankun Tang
>            Priority: Major
>             Fix For: 3.3.0, 3.2.1
>         Attachments: YARN-9160-trunk.001.patch
> An infra platform might want to provide the user a Zepplin notebook and execute user's
job with user's command input like "python entry_point.py ...". This is better for the end
user because he/she feels that the "entry_point.py" seems in the local workbench.
> This may translate to below submarine command in the platform when submitting the job:
> {code:java}
> ... job run
>   --localization entry_script.py:./
>   --localization depedency_script1.py:./
>   --localization depedency_script2.py:./
>   --worker_launch_cmd "python entry_point.py .."
> {code}
> Or 
> {code:java}
> ... job run
>   --localization entry_script.py:./
>   --localization depedency_scripts_dir:./
>   --worker_launch_cmd "python entry_script.py .."
> {code}
> When running with the above command, both will fail due to module import error from the
entry_point.py. This is because YARN only creates symbol links in the container's work dir
(the real scripts files are in different cache folders) and python module import won't know
> One possible solution is set localization with a directory containing all scripts and
change the worker_launch_cmd to "cd scripts_dir && python entry_script.py". But this
solution makes the user experience bad which feels not in a local workbench.
> And another solution is using "PYTHONPATH" environment variable. This solution can keep
the user experience good and won't need YARN localization internal changes.
> {code:java}
> ... job run
>  # the entry point
>  --localization entry_script.py:<path>/entry_script.py
>  # the dependency Python scripts of the entry point
>  --localization depedency_scripts_dir:<path>/dependency_scripts_dir
>  # the PYTHONPATH env to make dependency available to entry script
>  --env PYTHONPATH="<path>/dependency_scripts_dir"
>  --worker_launch_cmd "python <path>/entry_script.py ..."{code}
> And we should document this.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message