airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michal TOMA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-194) Task hangs in up_for_retry state for very long
Date Thu, 07 Jul 2016 07:16:10 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365722#comment-15365722
] 

Michal TOMA commented on AIRFLOW-194:
-------------------------------------

Did you attach the screen-shot? I can't see it.

> Task hangs in up_for_retry state for very long
> ----------------------------------------------
>
>                 Key: AIRFLOW-194
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-194
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: Airflow 1.7.0
>         Environment: Airflow 1.7.0 on RHEL 7 and OpenSuse 13.2
>            Reporter: Michal TOMA
>            Assignee: Siddharth Anand
>         Attachments: screenshot-1.png
>
>
> I can observe this problem on 2 separate Airflow installations.
> The symptoms are:
> - One (and only one) task stays in up_for_retry state even when the last of the retries
finished with an OK stays.
> - It is yellow in the tree view.
> - The execution somehow resumes several hours later automatically
> - It seems (not a certitude) related to a mode when the task execution is "lagging" behind
normal execution.
> Here is an example of a task that should run every hour "0 * * * *":
> Current date : 2016-05-30T15:31:00+0200
> ----- Run 1 ------
> Run ID: 2016-05-05T21:00:00
> Task start: 2015-05-30T07:38:XX.XXX
> Task end: 2015-05-30T08:23:XX.XXX
> Marked as success
> ----- Run 2 ------
> Run ID: 2016-05-05T22:00:00
> Task start: 2015-05-30T11:10:XX.XXX
> Task end: 2015-05-30T11:56:XX.XXX
> Marked as success
> ----- Run 3 ------
> Run ID: 2016-05-05T23:00:00
> Task start: 2015-05-30T11:56:XX.XXX
> Task end: 2015-05-30T12:41:XX.XXX
> Marked as success
> ----- Run 4 ------
> Run ID: 2016-05-06T00:00:00
> Task start: 2015-05-30T15:12:XX.XXX
> Task end: (Still running now)
> Marked as running
> There are nearly 2 hours between Run-1 and Run-2, and nearly 2 hours as well between
Run-3 and Run-4.
> Only Run-3 starts immediately after the end of Run-2 what is the expected behavior as
the Runs are very late on schedule (Run ID is 2016-05-06 while we are on 2016-05-30)
> This is a high priority issue for our setup. I could try to dig more in depth into this
problem but I have no idea where to look to debug this issue.
> Any pointers would be more than welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message