airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <bdbr...@gmail.com>
Subject Re: Scheduler silently dies
Date Sun, 26 Mar 2017 01:16:11 GMT
Please specify what “stop doing its job” means. It doesn’t log anything anymore? If it
does, the scheduler hasn’t died and hasn’t stopped.

B.


> On 24 Mar 2017, at 18:20, Gael Magnan <gaelmagnan@gmail.com> wrote:
> 
> We encountered the same kind of problem with the scheduler that stopped
> doing its job even after rebooting. I thought changing the start date or
> the state of a task instance might be to blame but I've never been able to
> pinpoint the problem either.
> 
> We are using celery and docker if it helps.
> 
> Le sam. 25 mars 2017 à 01:53, Bolke de Bruin <bdbruin@gmail.com> a écrit :
> 
>> We are running *without* num runs for over a year (and never have). It is
>> a very elusive issue which has not been reproducible.
>> 
>> I like more info on this but it needs to be very elaborate even to the
>> point of access to the system exposing the behavior.
>> 
>> Bolke
>> 
>> Sent from my iPhone
>> 
>>> On 24 Mar 2017, at 16:04, Vijay Ramesh <vijay@change.org> wrote:
>>> 
>>> We literally have a cron job that restarts the scheduler every 30 min.
>> Num
>>> runs didn't work consistently in rc4, sometimes it would restart itself
>> and
>>> sometimes we'd end up with a few zombie scheduler processes and things
>>> would get stuck. Also running locally, without celery.
>>> 
>>>> On Mar 24, 2017 16:02, <lrohde@quartethealth.com> wrote:
>>>> 
>>>> We have max runs set and still hit this. Our solution is dumber:
>>>> monitoring log output, and kill the scheduler if it stops emitting.
>> Works
>>>> like a charm.
>>>> 
>>>>> On Mar 24, 2017, at 5:50 PM, F. Hakan Koklu <fhakan.koklu@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Some solutions to this problem is restarting the scheduler frequently
>> or
>>>>> some sort of monitoring on the scheduler. We have set up a dag that
>> pings
>>>>> cronitor <https://cronitor.io/> (a dead man's snitch type of service)
>>>> every
>>>>> 10 minutes and the snitch pages you when the scheduler dies and does
>> not
>>>>> send a ping to it.
>>>>> 
>>>>> On Fri, Mar 24, 2017 at 1:49 PM, Andrew Phillips <
>> aphillips@qrmedia.com>
>>>>> wrote:
>>>>> 
>>>>>> We use celery and run into it from time to time.
>>>>>>> 
>>>>>> 
>>>>>> Bang goes my theory ;-) At least, assuming it's the same underlying
>>>>>> cause...
>>>>>> 
>>>>>> Regards
>>>>>> 
>>>>>> ap
>>>>>> 
>>>> 
>> 


Mime
View raw message