airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gael Magnan <gaelmag...@gmail.com>
Subject Re: Scheduler silently dies
Date Sat, 25 Mar 2017 01:20:18 GMT
We encountered the same kind of problem with the scheduler that stopped
doing its job even after rebooting. I thought changing the start date or
the state of a task instance might be to blame but I've never been able to
pinpoint the problem either.

We are using celery and docker if it helps.

Le sam. 25 mars 2017 à 01:53, Bolke de Bruin <bdbruin@gmail.com> a écrit :

> We are running *without* num runs for over a year (and never have). It is
> a very elusive issue which has not been reproducible.
>
> I like more info on this but it needs to be very elaborate even to the
> point of access to the system exposing the behavior.
>
> Bolke
>
> Sent from my iPhone
>
> > On 24 Mar 2017, at 16:04, Vijay Ramesh <vijay@change.org> wrote:
> >
> > We literally have a cron job that restarts the scheduler every 30 min.
> Num
> > runs didn't work consistently in rc4, sometimes it would restart itself
> and
> > sometimes we'd end up with a few zombie scheduler processes and things
> > would get stuck. Also running locally, without celery.
> >
> >> On Mar 24, 2017 16:02, <lrohde@quartethealth.com> wrote:
> >>
> >> We have max runs set and still hit this. Our solution is dumber:
> >> monitoring log output, and kill the scheduler if it stops emitting.
> Works
> >> like a charm.
> >>
> >>> On Mar 24, 2017, at 5:50 PM, F. Hakan Koklu <fhakan.koklu@gmail.com>
> >> wrote:
> >>>
> >>> Some solutions to this problem is restarting the scheduler frequently
> or
> >>> some sort of monitoring on the scheduler. We have set up a dag that
> pings
> >>> cronitor <https://cronitor.io/> (a dead man's snitch type of service)
> >> every
> >>> 10 minutes and the snitch pages you when the scheduler dies and does
> not
> >>> send a ping to it.
> >>>
> >>> On Fri, Mar 24, 2017 at 1:49 PM, Andrew Phillips <
> aphillips@qrmedia.com>
> >>> wrote:
> >>>
> >>>> We use celery and run into it from time to time.
> >>>>>
> >>>>
> >>>> Bang goes my theory ;-) At least, assuming it's the same underlying
> >>>> cause...
> >>>>
> >>>> Regards
> >>>>
> >>>> ap
> >>>>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message