airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michal TOMA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-194) Task hangs in up_for_retry state for very long
Date Thu, 04 Aug 2016 09:25:20 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407473#comment-15407473
] 

Michal TOMA commented on AIRFLOW-194:
-------------------------------------

Hi Chris,

I checked your screen-shot.

Sorry not to have noticed it in the issue before.
In fact, in your screen shot, you are experiencing exactly the problem I'm describing:

Task2 in the first DAG run started at 9h31
The longest task duration of all the 4 tasks was 12 minutes and 25 seconds.
This means that all 4 tasks should have been finished at 9h44 and the next DAG run should
have started immediately at 9h44.
But instead in your screen capture it started at 10h01 this means that the previous DAG run
spent 17 minutes doing nothing.

Michal

> Task hangs in up_for_retry state for very long
> ----------------------------------------------
>
>                 Key: AIRFLOW-194
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-194
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: Airflow 1.7.0
>         Environment: Airflow 1.7.0 on RHEL 7 and OpenSuse 13.2
>            Reporter: Michal TOMA
>            Assignee: Siddharth Anand
>         Attachments: screenshot-1.png, screenshot-2.png
>
>
> I can observe this problem on 2 separate Airflow installations.
> The symptoms are:
> - One (and only one) task stays in up_for_retry state even when the last of the retries
finished with an OK stays.
> - It is yellow in the tree view.
> - The execution somehow resumes several hours later automatically
> - It seems (not a certitude) related to a mode when the task execution is "lagging" behind
normal execution.
> Here is an example of a task that should run every hour "0 * * * *":
> Current date : 2016-05-30T15:31:00+0200
> ----- Run 1 ------
> Run ID: 2016-05-05T21:00:00
> Task start: 2015-05-30T07:38:XX.XXX
> Task end: 2015-05-30T08:23:XX.XXX
> Marked as success
> ----- Run 2 ------
> Run ID: 2016-05-05T22:00:00
> Task start: 2015-05-30T11:10:XX.XXX
> Task end: 2015-05-30T11:56:XX.XXX
> Marked as success
> ----- Run 3 ------
> Run ID: 2016-05-05T23:00:00
> Task start: 2015-05-30T11:56:XX.XXX
> Task end: 2015-05-30T12:41:XX.XXX
> Marked as success
> ----- Run 4 ------
> Run ID: 2016-05-06T00:00:00
> Task start: 2015-05-30T15:12:XX.XXX
> Task end: (Still running now)
> Marked as running
> There are nearly 2 hours between Run-1 and Run-2, and nearly 2 hours as well between
Run-3 and Run-4.
> Only Run-3 starts immediately after the end of Run-2 what is the expected behavior as
the Runs are very late on schedule (Run ID is 2016-05-06 while we are on 2016-05-30)
> This is a high priority issue for our setup. I could try to dig more in depth into this
problem but I have no idea where to look to debug this issue.
> Any pointers would be more than welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message