airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Silk <gs...@dropbox.com.INVALID>
Subject Re: 'Task Instance State' FAILED: Task is in the 'running' state which is not a valid state for execution. The task must be cleared in order to be run.
Date Thu, 11 Apr 2019 00:13:03 GMT
Two questions:
1) Are you eventually seeing the full log for the task, after it finishes?
2) Are you using S3 to store your logs?

On Thu, Feb 14, 2019 at 11:53 AM Dan Stoner <danstoner@gmail.com> wrote:

> More info!
>
> It appears that the Celery executor will silently fail if the
> credentials to a postgres results_backend are not valid.
>
> For example, we see:
>
> [2019-02-13 20:45:21,132] {{models.py:1353}} INFO - Dependencies not
> met for <TaskInstance: update_table_progress.update_table
> 2019-02-13T20:30:00+00:00 [running]>, dependency 'Task Instance Not
> Already Running' FAILED: Task is already running, it started on
> 2019-02-13 20:45:09.088978+00:00.
> [2019-02-13 20:45:21,132] {{models.py:1353}} INFO - Dependencies not
> met for <TaskInstance: update_table_progress.update_table
> 2019-02-13T20:30:00+00:00 [running]>, dependency 'Task Instance State'
> FAILED: Task is in the 'running' state which is not a valid state for
> execution. The task must be cleared in order to be run.
> [2019-02-13 20:45:21,135] {{logging_mixin.py:95}} INFO - [2019-02-13
> 20:45:21,134] {{jobs.py:2514}} INFO - Task is not able to be run
>
>
> but no database connection failure anywhere in the logs.
>
> After fixing our connection string (via
> AIRFLOW__CELERY__RESULT_BACKEND or result_backend in airflow.cfg),
> these issues went away.
>
>
> Sorry I cannot produce a more solid bug report but hopefully this is a
> breadcrumb for someone.
>
> Dan Stoner
>
> On Wed, Feb 13, 2019 at 10:16 PM Dan Stoner <danstoner@gmail.com> wrote:
> >
> > We saw this but the task instance state was generally "SUCCESS".
> >
> > In our case, we thought it was due to Redis being used as the results
> > store. There is a WARNING against this right in the operational logs.
> > Google Cloud Composer is surprisingly setup in this fashion.
> >
> > We went back to running our own infrastructure and using postgres as
> > the results store, those issues have not occurred since.
> >
> > The real downside we saw to this error was that our workers were
> > highly underutilized, we were getting terrible overall data
> > throughput, and the workers kept trying to run these tasks they
> > couldn't actually run.
> >
> > - Dan Stoner
> >
> >
> > On Wed, Feb 13, 2019 at 4:16 PM Kevin Lam <kevin@fathomhealth.co> wrote:
> > >
> > > Friendly ping on the above! Has anyone encountered this by chance?
> > >
> > > We're still seeing it occasionally on longer running tasks.
> > >
> > > On Tue, Nov 20, 2018 at 10:31 AM Kevin Lam <kevin@fathomhealth.co>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > We run Apache Airflow in Kubernetes in a manner very similar to what
> is
> > > > outlined in puckel/docker-airflow [1] (Celery Executor, Redis for
> > > > messaging, Postgres).
> > > >
> > > > Lately, we've encountered some of our Tasks getting stuck in a
> running
> > > > state, and printing out the errors:
> > > >
> > > > [2018-11-20 05:31:23,009] {models.py:1329} INFO - Dependencies not
> met for <TaskInstance: BLAH 2018-11-19T19:19:50.757184+00:00 [running]>,
> dependency 'Task Instance Not Already Running' FAILED: Task is already
> running, it started on 2018-11-19 23:29:11.974497+00:00.
> > > >> [2018-11-20 05:31:23,016] {models.py:1329} INFO - Dependencies not
> met for <TaskInstance: BLAH 2018-11-19T19:19:50.757184+00:00 [running]>,
> dependency 'Task Instance State' FAILED: Task is in the 'running' state
> which is not a valid state for execution. The task must be cleared in order
> to be run.
> > > >>
> > > >>
> > > > Is there anyway to avoid this? Does anyone know what causes this
> issue?
> > > >
> > > > This is quite problematic. The task is stuck in running state without
> > > > making any progress when the above error occurs, and so turning on
> retries
> > > > on doesn't help with getting our DAGs to reliably run to completion.
> > > >
> > > > Thanks!
> > > >
> > > > [1] https://github.com/puckel/docker-airflow
> > > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message