airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ruiqin Yang <yrql...@gmail.com>
Subject Re: Failover in apache 1.8.0
Date Fri, 20 Jul 2018 08:04:29 GMT
Hi Shubham,

Worker running actual airflow task will regularly heartbeat, which updates
the task instance entry in the DB. Scheduler will kill task instance w/o
heartbeat for a long time, called zombie tasks, and if the task has retry
left it will try to reschedule it( given all trigger rules are satisfied).

If workers have heavy load, the scheduler will still be able to schedule
tasks( putting tasks into worker queue). And you will just wait for workers
to pick up the tasks from the queue. If the tasks never get picked up and
the scheduler lost track of it, their state will be reset to NONE when
scheduler restarts, they are called orphan tasks.

FYI, inside Airbnb, Alex Guziel( @saguziel <https://github.com/saguziel>)
has a patch that will requeue tasks if they don't get picked up by workers
for a long time and he has plan to open source it.

Cheers,
Kevin Y

On Fri, Jul 20, 2018 at 12:40 AM Shubham Gupta <shubham180695.sg@gmail.com>
wrote:

> Hi,
>
> I would like to know what happens if a Celery worker running one of the
> tasks crashes. Will the job be rescheduled?
>
> Also, if the scheduler is not able to schedule a task on time due to heavy
> load on all workers, what will happen to the task?
>
> Regards
> Shubham Gupta
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message