airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Soeren Laursen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AIRFLOW-1947) airflow json file created i /tmp get wrong permission when using run_as_user
Date Sun, 31 Dec 2017 11:04:03 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Soeren Laursen updated AIRFLOW-1947:
------------------------------------
    Description: 
We are using run_as_user on two specific task, to make sure that the resulting files are assigned
to the correct user.

If we are running the task as the Airflow user the task get done as expected.

# DAG START #
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta

default_args = {
        'owner': 'airflow',
        'depends_on_past': False,
        'start_date': datetime(2015, 6, 1),
        'email': ['sln@fcoo.dk'],
        'email_on_failure': False,
        'email_on_retry': False,
        'retries': 1,
        'retry_delay': timedelta(minutes=5),
        'queue': 'storage-arch03',
        'dagrun_timeout' : timedelta(minutes=60)
        # 'pool': 'backfill',
        # 'priority_weight': 10,
        # 'end_date': datetime(2016, 1, 1),
    }

dag = DAG('Archive_Sentinel-1_data_from_FCOO_ftp_server', default_args=default_args, schedule_interval=timedelta(1))

archivingTodaysData = BashOperator(
    task_id='Archive_todays_data',
    bash_command='/home/airflow/airflowScripts/archive-Sentinel-1-data.sh 0 ',
    dag=dag)

archivingYesterdaysData = BashOperator(
    task_id='Archive_yesterdays_data',
    bash_command='/home/airflow/airflowScripts/archive-Sentinel-1-data.sh 1 ',
    dag=dag)

# First archive the newest data, then the data from yesterday.
archivingYesterdaysData.set_upstream( archivingTodaysData )

# DAG END #

When we run the tast with a user called prod by using the run_as_user, the file(s) are generated
In the /tmp
-rw-------  1 airflow airflow 2205 dec 19 11:46 tmpicu87_au

But the prod user cannot read the file. From the log file we have:
[2017-12-19 11:46:31,803] {base_task_runner.py:112} INFO - Running: ['bash', '-c', 'sudo -H
-u prod airflow run Archive_Sentinel-1_data_from_FCOO_ftp_server Archive_yesterdays_data 2017-12-19T00:00:00
--job_id 1047 --raw -sd DAGS_FOLDER/archive-Sentinel-1-data-from-ftp-server.py --cfg_path
/tmp/tmpicu87_au']
[2017-12-19 11:46:32,463] {base_task_runner.py:95} INFO - Subtask: [2017-12-19 11:46:32,462]
{__init__.py:57} INFO - Using executor SequentialExecutor
[2017-12-19 11:46:32,587] {base_task_runner.py:95} INFO - Subtask: [2017-12-19 11:46:32,587]
{driver.py:120} INFO - Generating grammar tables from /usr/lib/python3.5/lib2to3/Grammar.txt
[2017-12-19 11:46:32,630] {base_task_runner.py:95} INFO - Subtask: [2017-12-19 11:46:32,630]
{driver.py:120} INFO - Generating grammar tables from /usr/lib/python3.5/lib2to3/PatternGrammar.txt
[2017-12-19 11:46:33,124] {base_task_runner.py:95} INFO - Subtask: /usr/local/lib/python3.5/dist-packages/airflow/www/app.py:23:
FlaskWTFDeprecationWarning: "flask_wtf.CsrfProtect" has been renamed to "CSRFProtect" and
will be removed in 1.0.
[2017-12-19 11:46:33,124] {base_task_runner.py:95} INFO - Subtask:   csrf = CsrfProtect()
[2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask: Traceback (most recent
call last):
[2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask:   File "/usr/local/bin/airflow",
line 28, in <module>
[2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask:     args.func(args)
[2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask:   File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py",
line 329, in run
[2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask:     with open(args.cfg_path,
'r') as conf_file:
[2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask: PermissionError: [Errno
13] Permission denied: '/tmp/tmpicu87_au'
[2017-12-19 11:46:36,770] {jobs.py:2125} INFO - Task exited with return code 1






  was:
We are using run_as_user on two specific task, to make sure that the resulting files are assigned
to the correct user.

If we are running the task as the Airflow user the task get done as expected.

***** DAG *****
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta

default_args = {
        'owner': 'airflow',
        'depends_on_past': False,
        'start_date': datetime(2015, 6, 1),
        'email': ['sln@fcoo.dk'],
        'email_on_failure': False,
        'email_on_retry': False,
        'retries': 1,
        'retry_delay': timedelta(minutes=5),
        'queue': 'storage-arch03',
        'dagrun_timeout' : timedelta(minutes=60)
        # 'pool': 'backfill',
        # 'priority_weight': 10,
        # 'end_date': datetime(2016, 1, 1),
    }




dag = DAG('Archive_Sentinel-1_data_from_FCOO_ftp_server', default_args=default_args, schedule_interval=timedelta(1))

archivingTodaysData = BashOperator(
    task_id='Archive_todays_data',
    bash_command='/home/airflow/airflowScripts/archive-Sentinel-1-data.sh 0 ',
    dag=dag)

archivingYesterdaysData = BashOperator(
    task_id='Archive_yesterdays_data',
    bash_command='/home/airflow/airflowScripts/archive-Sentinel-1-data.sh 1 ',
    dag=dag)


# First archive the newest data, then the data from yesterday.
archivingYesterdaysData.set_upstream( archivingTodaysData )
***** DAG *****


When we run the tast with a user called prod by using the run_as_user, the file(s) are generated
In the /tmp
-rw-------  1 airflow airflow 2205 dec 19 11:46 tmpicu87_au

But the prod user cannot read the file. From the log file we have:
[2017-12-19 11:46:31,803] {base_task_runner.py:112} INFO - Running: ['bash', '-c', 'sudo -H
-u prod airflow run Archive_Sentinel-1_data_from_FCOO_ftp_server Archive_yesterdays_data 2017-12-19T00:00:00
--job_id 1047 --raw -sd DAGS_FOLDER/archive-Sentinel-1-data-from-ftp-server.py --cfg_path
/tmp/tmpicu87_au']
[2017-12-19 11:46:32,463] {base_task_runner.py:95} INFO - Subtask: [2017-12-19 11:46:32,462]
{__init__.py:57} INFO - Using executor SequentialExecutor
[2017-12-19 11:46:32,587] {base_task_runner.py:95} INFO - Subtask: [2017-12-19 11:46:32,587]
{driver.py:120} INFO - Generating grammar tables from /usr/lib/python3.5/lib2to3/Grammar.txt
[2017-12-19 11:46:32,630] {base_task_runner.py:95} INFO - Subtask: [2017-12-19 11:46:32,630]
{driver.py:120} INFO - Generating grammar tables from /usr/lib/python3.5/lib2to3/PatternGrammar.txt
[2017-12-19 11:46:33,124] {base_task_runner.py:95} INFO - Subtask: /usr/local/lib/python3.5/dist-packages/airflow/www/app.py:23:
FlaskWTFDeprecationWarning: "flask_wtf.CsrfProtect" has been renamed to "CSRFProtect" and
will be removed in 1.0.
[2017-12-19 11:46:33,124] {base_task_runner.py:95} INFO - Subtask:   csrf = CsrfProtect()
[2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask: Traceback (most recent
call last):
[2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask:   File "/usr/local/bin/airflow",
line 28, in <module>
[2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask:     args.func(args)
[2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask:   File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py",
line 329, in run
[2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask:     with open(args.cfg_path,
'r') as conf_file:
[2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask: PermissionError: [Errno
13] Permission denied: '/tmp/tmpicu87_au'
[2017-12-19 11:46:36,770] {jobs.py:2125} INFO - Task exited with return code 1







> airflow json file created i /tmp get wrong permission when using run_as_user
> ----------------------------------------------------------------------------
>
>                 Key: AIRFLOW-1947
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1947
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DagRun
>    Affects Versions: Airflow 1.8
>         Environment: ubuntu 16.04 LTS
>            Reporter: Soeren Laursen
>            Priority: Critical
>
> We are using run_as_user on two specific task, to make sure that the resulting files
are assigned to the correct user.
> If we are running the task as the Airflow user the task get done as expected.
> # DAG START #
> from airflow import DAG
> from airflow.operators.bash_operator import BashOperator
> from datetime import datetime, timedelta
> default_args = {
>         'owner': 'airflow',
>         'depends_on_past': False,
>         'start_date': datetime(2015, 6, 1),
>         'email': ['sln@fcoo.dk'],
>         'email_on_failure': False,
>         'email_on_retry': False,
>         'retries': 1,
>         'retry_delay': timedelta(minutes=5),
>         'queue': 'storage-arch03',
>         'dagrun_timeout' : timedelta(minutes=60)
>         # 'pool': 'backfill',
>         # 'priority_weight': 10,
>         # 'end_date': datetime(2016, 1, 1),
>     }
> dag = DAG('Archive_Sentinel-1_data_from_FCOO_ftp_server', default_args=default_args,
schedule_interval=timedelta(1))
> archivingTodaysData = BashOperator(
>     task_id='Archive_todays_data',
>     bash_command='/home/airflow/airflowScripts/archive-Sentinel-1-data.sh 0 ',
>     dag=dag)
> archivingYesterdaysData = BashOperator(
>     task_id='Archive_yesterdays_data',
>     bash_command='/home/airflow/airflowScripts/archive-Sentinel-1-data.sh 1 ',
>     dag=dag)
> # First archive the newest data, then the data from yesterday.
> archivingYesterdaysData.set_upstream( archivingTodaysData )
> # DAG END #
> When we run the tast with a user called prod by using the run_as_user, the file(s) are
generated In the /tmp
> -rw-------  1 airflow airflow 2205 dec 19 11:46 tmpicu87_au
> But the prod user cannot read the file. From the log file we have:
> [2017-12-19 11:46:31,803] {base_task_runner.py:112} INFO - Running: ['bash', '-c', 'sudo
-H -u prod airflow run Archive_Sentinel-1_data_from_FCOO_ftp_server Archive_yesterdays_data
2017-12-19T00:00:00 --job_id 1047 --raw -sd DAGS_FOLDER/archive-Sentinel-1-data-from-ftp-server.py
--cfg_path /tmp/tmpicu87_au']
> [2017-12-19 11:46:32,463] {base_task_runner.py:95} INFO - Subtask: [2017-12-19 11:46:32,462]
{__init__.py:57} INFO - Using executor SequentialExecutor
> [2017-12-19 11:46:32,587] {base_task_runner.py:95} INFO - Subtask: [2017-12-19 11:46:32,587]
{driver.py:120} INFO - Generating grammar tables from /usr/lib/python3.5/lib2to3/Grammar.txt
> [2017-12-19 11:46:32,630] {base_task_runner.py:95} INFO - Subtask: [2017-12-19 11:46:32,630]
{driver.py:120} INFO - Generating grammar tables from /usr/lib/python3.5/lib2to3/PatternGrammar.txt
> [2017-12-19 11:46:33,124] {base_task_runner.py:95} INFO - Subtask: /usr/local/lib/python3.5/dist-packages/airflow/www/app.py:23:
FlaskWTFDeprecationWarning: "flask_wtf.CsrfProtect" has been renamed to "CSRFProtect" and
will be removed in 1.0.
> [2017-12-19 11:46:33,124] {base_task_runner.py:95} INFO - Subtask:   csrf = CsrfProtect()
> [2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask: Traceback (most recent
call last):
> [2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask:   File "/usr/local/bin/airflow",
line 28, in <module>
> [2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask:     args.func(args)
> [2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask:   File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py",
line 329, in run
> [2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask:     with open(args.cfg_path,
'r') as conf_file:
> [2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask: PermissionError: [Errno
13] Permission denied: '/tmp/tmpicu87_au'
> [2017-12-19 11:46:36,770] {jobs.py:2125} INFO - Task exited with return code 1



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message