airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hila Visan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-47) ExternalTaskSensor causes scheduling dead lock
Date Wed, 11 May 2016 05:26:12 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279570#comment-15279570
] 

Hila Visan commented on AIRFLOW-47:
-----------------------------------

Hi [~jlowin]
Thanks for your reply.
I am new to airflow, and currently learn the material (so I am not familiar with xcom yet).
Regarding the above issue:
Why did you say that I have 24 _ExternalTaskSensor_? I have one _ExternalTaskSensor_ per each
day.
As I already wrote, currently there are 16 tasks that were scheduled to run but are waiting
to other hourly task.
Attached screen shot of the "runing" tasks - 16 tasks of "daily_sensor" (ExternalTaskSensor)
all of them are waiting for the last hourly task of the specific day to run.
Log example of one of the tasks (2016-04-02 00:00:00):
{noformat}
Attempt 1 out of 4
--------------------------------------------------------------------------------

[2016-05-11 04:58:38,534] {models.py:1041} INFO - Executing <Task(ExternalTaskSensor):
daily_sensor> on 2016-04-02 00:00:00
[2016-05-11 04:58:38,567] {sensors.py:188} INFO - Poking for 1_hourly_agg.print_hourly1 on
2016-04-02 00:00:00 ... 
[2016-05-11 04:59:38,667] {sensors.py:188} INFO - Poking for 1_hourly_agg.print_hourly1 on
2016-04-02 00:00:00 ... 
[2016-05-11 05:00:38,758] {sensors.py:188} INFO - Poking for 1_hourly_agg.print_hourly1 on
2016-04-02 00:00:00 ... 
[2016-05-11 05:01:38,855] {sensors.py:188} INFO - Poking for 1_hourly_agg.print_hourly1 on
2016-04-02 00:00:00 ... 
[2016-05-11 05:02:38,933] {sensors.py:188} INFO - Poking for 1_hourly_agg.print_hourly1 on
2016-04-02 00:00:00 ... 
[2016-05-11 05:03:38,994] {sensors.py:188} INFO - Poking for 1_hourly_agg.print_hourly1 on
2016-04-02 00:00:00 ... 
[2016-05-11 05:04:39,057] {sensors.py:188} INFO - Poking for 1_hourly_agg.print_hourly1 on
2016-04-02 00:00:00 ... 
[2016-05-11 05:05:39,174] {sensors.py:188} INFO - Poking for 1_hourly_agg.print_hourly1 on
2016-04-02 00:00:00 ... 
[2016-05-11 05:06:39,766] {sensors.py:188} INFO - Poking for 1_hourly_agg.print_hourly1 on
2016-04-02 00:00:00 ... 
[2016-05-11 05:07:39,842] {sensors.py:188} INFO - Poking for 1_hourly_agg.print_hourly1 on
2016-04-02 00:00:00 ... 
{noformat}  

I'll try to change my code, and make it work, but I think that there is an issue here.
If you need more info/logs let me know.

Thanks
Hila 

> ExternalTaskSensor causes scheduling dead lock
> ----------------------------------------------
>
>                 Key: AIRFLOW-47
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-47
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: operators, scheduler
>    Affects Versions: Airflow 1.7.0
>         Environment: CentOS 6.5
> Airflow 1.7.0 with SequentialExecuter 
>            Reporter: Hila Visan
>         Attachments: screenshot-1.png
>
>
> We are trying to use 'ExternalTaskSensor' to coordinate between a daily DAG and an hourly
DAG  (daily dags  depend on hourly).
> Relevant code: 
> *Daily DAG definition:*
> {code:title=2_daily_dag.py|borderStyle=solid}
> default_args = {
>     …
>     'start_date': datetime(2016, 4, 2),
>     …
> }
> dag = DAG(dag_id='2_daily_agg', default_args=default_args, schedule_interval="@daily")
> ext_dep = ExternalTaskSensor(
>     external_dag_id='1_hourly_agg',
>     external_task_id='print_hourly1',
>     task_id='evening_hours_sensor',
>     dag=dag)
> {code}
> *Hourly DAG definition:*
> {code:title=1_hourly_dag.py|borderStyle=solid}
> default_args = {
>     …
>     'start_date': datetime(2016, 4, 1),
>     …
> }
> dag = DAG(dag_id='1_hourly_agg', default_args=default_args, schedule_interval="@hourly")
> t1 = BashOperator(
>     task_id='print_hourly1',
>     bash_command='echo hourly job1',
>     dag=dag)
> {code}
> The hourly dag was executed twice for the following execution dates:
> 04-01T00:00:00	
> 04-01T01:00:00
> Then the daily dag was executed, and is still running....	 
> According to logs, daily dag is waiting for hourly dag to complete:
> {noformat}
> [2016-05-04 06:01:20,978] {models.py:1041} INFO - Executing<Task(ExternalTaskSensor):
evening_hours_sensor> on 2016-04-03 00:00:00
> [2016-05-04 06:01:20,984] {sensors.py:188} INFO - Poking for 1_hourly_agg.print_hourly1
on 2016-04-02 00:00:00 ... 
> [2016-05-04 06:02:21,053] {sensors.py:188} INFO - Poking for 1_hourly_agg.print_hourly1
on 2016-04-02 00:00:00 ... }}
> {noformat}
> How can I solve this dead-lock?
> In Addition- I didn't understand if it means that the daily dag depends only on the "last"
hourly dag of the same day (23-24pm)? 
> What happens if the hourly dag of other hour fails?
> Thanks a lot! 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message