airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxime Beauchemin <maximebeauche...@gmail.com>
Subject Re: last task in the dag is not running
Date Wed, 03 May 2017 16:25:22 GMT
One way to debug these "false starts" or tasks that don't even get to the
point where the logging is initiated is to:
1. look at the scheduler log to get the exact command that is put in the
queue for remote execution
2. copy the exact command
3. go on the worker and try to recreate the exact context in which the
worker operates (unix user, env vars, shell type, ...)
4. run the command, hopefully you have recreated the false start at this
point (the task does not run)
5. view the pre-logs (stdout), and debug from this context

A common scenario where this happens is say the DAG module imports some
library that exists on the scheduler, but not in the worker's context, so
the task can no even be initiated in any way.

One way to help prevent this is being very cautious that the run context
for all your processes on the cluster are identical. You do not want to get
in a place where the python environment is diverging on different boxes,
unless you're using queues and you actually are doing it by choice.

Max

On Wed, May 3, 2017 at 5:39 AM, Bolke de Bruin <bdbruin@gmail.com> wrote:

> Hi Dmitry,
>
> Please provide more information, such as logs and the DAG definition
> itself. This is very little to go on unfortunately.
>
> Bolke
>
> > On 3 May 2017, at 10:22, Dmitry Smirnov <dmi.smirnov07@gmail.com> wrote:
> >
> > Hi everyone,
> >
> > I'm using Airflow version 1.8.0, just upgraded from 1.7.1.3. The issue
> that
> > I'm going to describe started already in 1.7.1.3, I upgraded hoping it
> > might help resolve it.
> >
> > I have several DAGs for which the *last* task is not moving from queued
> to
> > running.
> > These DAGs used to run fine some time ago, but then we had issues with
> > rabbitmq cluster we use, and after resetting it up, the problem emerged.
> > I'm pretty sure the queue is working fine, since all the tasks except the
> > very last one are queued automatically and run fine.
> > For the sake of testing, I added a copy of the last task to the DAG, and
> > interestingly, the task that used to be the last and did not run, now
> > started to run normally, but the new last task is stuck.
> > I checked logs at the DEBUG level and I could see that scheduler queues
> the
> > tasks, but those tasks don't show up in the Celery/Flower dashboard in
> the
> > corresponding queue.
> > When I run the task that is stuck from the webserver interface, they show
> > up in the queue in Flower dashboard and run successfully.
> > So, overall, it seems that the issue is present with the scheduler but
> not
> > with webserver, and that this issue is only related to the very last task
> > in the DAG.
> > I'm really stuck now, I would welcome any suggestions / ideas on what can
> > be done.
> >
> > Thank you in advance!
> > BR, Dima
> >
> > --
> >
> > Dmitry Smirnov (MSc.)
> > Data Engineer @ Yousician
> > mobile: +358 50 3015072
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message