airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-1893) PYTHONPATH is not propagated to `run_as_user` context, affecting DAGs using the custom packages
Date Tue, 12 Dec 2017 00:49:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286879#comment-16286879
] 

ASF subversion and git services commented on AIRFLOW-1893:
----------------------------------------------------------

Commit 9d9727a80a3948615a4085d5168c24394fde5c84 in incubator-airflow's branch refs/heads/master
from [~erod]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=9d9727a ]

[AIRFLOW-1893][AIRFLOW-1901] Propagate PYTHONPATH when using impersonation

When using impersonation via `run_as_user`, the
PYTHONPATH environment
variable is not propagated hence there may be
issues when depending on
specific custom packages used in DAGs.
This PR propagates only the PYTHONPATH in the
process creating the
sub-process with impersonation, if any.

Tested in staging environment; impersonation tests
in airflow are not very portable and fixing them
would take additional work, leaving as TODO and
tracking with jira ticket: https://issues.apache.o
rg/jira/browse/AIRFLOW-1901.

Closes #2860 from edgarRd/erod-
pythonpath_run_as_user


> PYTHONPATH is not propagated to `run_as_user` context, affecting DAGs using the custom
packages
> -----------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-1893
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1893
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Edgar Rodriguez
>            Assignee: Edgar Rodriguez
>
> When running DAGs with {{run_as_user}} the {{PYTHONPATH}} env is not available in the
user's context given that {{sudo}} wipes out the env variables. For instance, a DAG using
a custom package will fail with the following exception:
> {code}
> [2017-12-06 01:50:08,183] {base_task_runner.py:92} INFO - Subtask: [2017-12-06 01:50:08,183]
{models.py:271} INFO - Processed file is not a zip file
> [2017-12-06 01:50:08,184] {base_task_runner.py:92} INFO - Subtask: [2017-12-06 01:50:08,184]
{models.py:423} INFO - Processing dag_folder as file
> [2017-12-06 01:50:08,184] {base_task_runner.py:92} INFO - Subtask: [2017-12-06 01:50:08,184]
{models.py:251} INFO - Processing filepath /data/airflow/test_run_as_user.py
> [2017-12-06 01:50:08,184] {base_task_runner.py:92} INFO - Subtask: [2017-12-06 01:50:08,184]
{models.py:271} INFO - Processed file is not a zip file
> [2017-12-06 01:50:08,185] {base_task_runner.py:92} INFO - Subtask: [2017-12-06 01:50:08,185]
{models.py:293} ERROR - Failed to import: /data/airflow/test_run_as_user.py
> [2017-12-06 01:50:08,185] {base_task_runner.py:92} INFO - Subtask: Traceback (most recent
call last):
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:   File "/usr/local/lib/python2.7/dist-packages/airflow/models.py",
line 290, in process_file
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:     m = imp.load_source(mod_name,
filepath)
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:   File "/data/airflow/test_run_as_user.py",
line 7, in 
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:     from contrib.date_utils
import ds_replace
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask: ImportError: No module
named contrib.date_utils
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask: Traceback (most recent
call last):
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:   File "/usr/local/bin/airflow",
line 28, in 
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:     args.func(args)
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:   File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py",
line 349, in run
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:     dag = get_dag(args)
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:   File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py",
line 132, in get_dag
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:     'parse.'.format(args.dag_id))
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask: airflow.exceptions.AirflowException:
dag_id could not be found: test_run_as_user. Either the dag did not exist or it failed to
parse.
> [2017-12-06 01:51:07,258] {jobs.py:186} DEBUG - [heartbeat]
> {code}
> *Possible location of the issue in Airflow*
>  {{airflow/airflow/task_runner/base_task_runner.py}}
> *Resolution:*
> Since {{sudo}} wipes out the environment variables for security concerns, instead of
using the {{-E}} flag to propagate all variables, we can just pass the {{PYTHONPATH}} variable
within the command in order to have access to the same python packages as the process spawning
the {{sudo}} command.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message