airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joe Schmid (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AIRFLOW-1011) Task Instance Results not stored for SubDAG Tasks
Date Sun, 19 Mar 2017 14:33:41 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joe Schmid updated AIRFLOW-1011:
--------------------------------
    Description: 
In previous Airflow versions, results for tasks executed as a subdag were written as rows
to task_instances. In Airflow 1.8 only rows for tasks inside the top-level DAG (non-subdag
tasks) seem to get written to the database.

This results in being unable to check the status of task instances inside the subdag from
the UI, check the logs for those task instances from the UI, etc.

Attached is a simple test DAG that exhibits the issue along with screenshots showing the UI
differences between v1.8 and v1.7.1.3.

  was:
In previous Airflow versions, results for tasks executed as a subdag were written as rows
to task_instances. In Airflow 1.8 only rows for tasks inside the top-level DAG (non-subdag
tasks) seem to get written to the database.

This results in being unable to check the status of task instances inside the subdag from
the UI, check the logs for those task instances from the UI, etc.

Here is a simple test DAG that exhibits the issue:

------------------------------------------------------------------------

from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.subdag_operator import SubDagOperator
from airflow.models import DAG
from datetime import datetime, timedelta

args = {
    'owner': 'airflow',
    'start_date': datetime(2016, 3, 1),
}

DAG_NAME = 'Test_SubDAG'
SUBDAG_OP = 'SubDagOp'


def get_test_subdag():
    subdag = DAG(
        dag_id='{}.{}'.format(DAG_NAME, SUBDAG_OP), default_args=args,
        schedule_interval="@daily")  # This is ignored, but it can't be None or @once

    first = DummyOperator(
        task_id='SubDAG_Task1',
        dag=subdag
    )

    last = DummyOperator(
        task_id='SubDAG_Task2',
        dag=subdag
    )
    first >> last
    return subdag

dag = DAG(
    dag_id=DAG_NAME, default_args=args,
    schedule_interval=None,
    dagrun_timeout=timedelta(hours=1))

run_first = DummyOperator(
    task_id='DAG_Task1',
    dag=dag
)

run_subdag = SubDagOperator(
    subdag=get_test_subdag(),
    task_id=SUBDAG_OP,
    dag=dag
)

run_last = DummyOperator(
    task_id='DAG_Task2',
    dag=dag
)

run_first >> run_subdag
run_subdag >> run_last



> Task Instance Results not stored for SubDAG Tasks
> -------------------------------------------------
>
>                 Key: AIRFLOW-1011
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1011
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: backfill, subdag
>    Affects Versions: Airflow 1.8
>            Reporter: Joe Schmid
>            Priority: Critical
>         Attachments: 1-TopLevelDAGTaskInstancesShownCorrectly.png, 2-ZoomedSubDAG-NoTaskInstances-v1.8.png,
3-ZoomedSubDAG-TaskInstances-v1.7.1.3.png, test_subdag.py
>
>
> In previous Airflow versions, results for tasks executed as a subdag were written as
rows to task_instances. In Airflow 1.8 only rows for tasks inside the top-level DAG (non-subdag
tasks) seem to get written to the database.
> This results in being unable to check the status of task instances inside the subdag
from the UI, check the logs for those task instances from the UI, etc.
> Attached is a simple test DAG that exhibits the issue along with screenshots showing
the UI differences between v1.8 and v1.7.1.3.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message