airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joe Schmid (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AIRFLOW-1011) Task Instance Results not stored for SubDAG Tasks
Date Sun, 19 Mar 2017 14:28:41 GMT
Joe Schmid created AIRFLOW-1011:
-----------------------------------

             Summary: Task Instance Results not stored for SubDAG Tasks
                 Key: AIRFLOW-1011
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1011
             Project: Apache Airflow
          Issue Type: Bug
          Components: backfill, subdag
    Affects Versions: Airflow 1.8
            Reporter: Joe Schmid
            Priority: Critical
         Attachments: 1-TopLevelDAGTaskInstancesShownCorrectly.png, 2-ZoomedSubDAG-NoTaskInstances-v1.8.png,
3-ZoomedSubDAG-TaskInstances-v1.7.1.3.png

In previous Airflow versions, results for tasks executed as a subdag were written as rows
to task_instances. In Airflow 1.8 only rows for tasks inside the top-level DAG (non-subdag
tasks) seem to get written to the database.

This results in being unable to check the status of task instances inside the subdag from
the UI, check the logs for those task instances from the UI, etc.

Here is a simple test DAG that exhibits the issue:

------------------------------------------------------------------------

from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.subdag_operator import SubDagOperator
from airflow.models import DAG
from datetime import datetime, timedelta

args = {
    'owner': 'airflow',
    'start_date': datetime(2016, 3, 1),
}

DAG_NAME = 'Test_SubDAG'
SUBDAG_OP = 'SubDagOp'


def get_test_subdag():
    subdag = DAG(
        dag_id='{}.{}'.format(DAG_NAME, SUBDAG_OP), default_args=args,
        schedule_interval="@daily")  # This is ignored, but it can't be None or @once

    first = DummyOperator(
        task_id='SubDAG_Task1',
        dag=subdag
    )

    last = DummyOperator(
        task_id='SubDAG_Task2',
        dag=subdag
    )
    first >> last
    return subdag

dag = DAG(
    dag_id=DAG_NAME, default_args=args,
    schedule_interval=None,
    dagrun_timeout=timedelta(hours=1))

run_first = DummyOperator(
    task_id='DAG_Task1',
    dag=dag
)

run_subdag = SubDagOperator(
    subdag=get_test_subdag(),
    task_id=SUBDAG_OP,
    dag=dag
)

run_last = DummyOperator(
    task_id='DAG_Task2',
    dag=dag
)

run_first >> run_subdag
run_subdag >> run_last




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message