airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeremiah Lowin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-47) ExternalTaskSensor causes scheduling dead lock
Date Tue, 10 May 2016 23:16:13 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279194#comment-15279194
] 

Jeremiah Lowin commented on AIRFLOW-47:
---------------------------------------

That may be right, though my impression was that the first attempt was not using CeleryExecutor
and the second attempt (where this issue manifested) was. That's very different from an "airflow"
deadlock though, because if tasks are running Airflow won't declare a deadlock. It sounds
more like a question of resource allocation and would probably be better served by refactoring
the DAG so it wasn't so dependent on external task sensors.

[~hilaviz] for example, instead of using 24 external_task_sensors, you could use a a single
task in your daily dag that blocks until each of the 24 hourly tasks have finished. You could
do that either by having the hourly tasks drop XComs which the sensor task would look for,
or by accessing the AirflowDB directly and checking their state (that's more complicated though).
That suggestion is only if the resource question is a real issue though.

However -- my belief right now is that the DAG as constructed is trying to sense tasks with
the same execution_date and what Hila actually wants is to sense tasks from earlier execution_dates.
I think you need to use the {{execution_delta}} parameter to look at different execution_dates.
So at the moment I'm not convinced this is a true bug at all :)

> ExternalTaskSensor causes scheduling dead lock
> ----------------------------------------------
>
>                 Key: AIRFLOW-47
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-47
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: operators, scheduler
>    Affects Versions: Airflow 1.7.0
>         Environment: CentOS 6.5
> Airflow 1.7.0 with SequentialExecuter 
>            Reporter: Hila Visan
>
> We are trying to use 'ExternalTaskSensor' to coordinate between a daily DAG and an hourly
DAG  (daily dags  depend on hourly).
> Relevant code: 
> *Daily DAG definition:*
> {code:title=2_daily_dag.py|borderStyle=solid}
> default_args = {
>     …
>     'start_date': datetime(2016, 4, 2),
>     …
> }
> dag = DAG(dag_id='2_daily_agg', default_args=default_args, schedule_interval="@daily")
> ext_dep = ExternalTaskSensor(
>     external_dag_id='1_hourly_agg',
>     external_task_id='print_hourly1',
>     task_id='evening_hours_sensor',
>     dag=dag)
> {code}
> *Hourly DAG definition:*
> {code:title=1_hourly_dag.py|borderStyle=solid}
> default_args = {
>     …
>     'start_date': datetime(2016, 4, 1),
>     …
> }
> dag = DAG(dag_id='1_hourly_agg', default_args=default_args, schedule_interval="@hourly")
> t1 = BashOperator(
>     task_id='print_hourly1',
>     bash_command='echo hourly job1',
>     dag=dag)
> {code}
> The hourly dag was executed twice for the following execution dates:
> 04-01T00:00:00	
> 04-01T01:00:00
> Then the daily dag was executed, and is still running....	 
> According to logs, daily dag is waiting for hourly dag to complete:
> {noformat}
> [2016-05-04 06:01:20,978] {models.py:1041} INFO - Executing<Task(ExternalTaskSensor):
evening_hours_sensor> on 2016-04-03 00:00:00
> [2016-05-04 06:01:20,984] {sensors.py:188} INFO - Poking for 1_hourly_agg.print_hourly1
on 2016-04-02 00:00:00 ... 
> [2016-05-04 06:02:21,053] {sensors.py:188} INFO - Poking for 1_hourly_agg.print_hourly1
on 2016-04-02 00:00:00 ... }}
> {noformat}
> How can I solve this dead-lock?
> In Addition- I didn't understand if it means that the daily dag depends only on the "last"
hourly dag of the same day (23-24pm)? 
> What happens if the hourly dag of other hour fails?
> Thanks a lot! 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message