airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lro...@quartethealth.com
Subject Re: Scheduler silently dies
Date Fri, 24 Mar 2017 20:37:18 GMT
We use celery and run into it from time to time. 

On Mar 24, 2017, at 4:16 PM, Andrew Phillips <andrewp@apache.org> wrote:

>> Does anyone have any idea why this happens? It seems like a bug that should
>> be fixed, but we're all just living with it instead of trying to fix it.
> 
> From the little I understand, one of the main problems here is that it seems very difficult
to reliably reproduce the issue. There are a bunch of threads (e.g. [1, 2]) related to this
topic, and our Airflow setup has the same issue, but in our case I unfortunately still haven't
been able to come up with a suitable DAG and triggering config that is guaranteed to cause
this.
> 
> My best guess at this point, related to a recent comment from Maxime [3], is that some
task execution in the local executor is causing this. I haven't yet been able to try Celery
to see if that fixes the issue, though.
> 
> Regards
> 
> ap
> 
> [1] http://apache.markmail.org/thread/knfa2trexiwz4j2h
> [2] http://apache.markmail.org/thread/ru2gpw22sb5k6tyq
> [3] http://markmail.org/message/docquxgjnzcp27pi
> 
> 
>> Just my two cents.
>> -N
>> nik.hodgkinson@collectivehealth.com
>> On Fri, Mar 24, 2017 at 12:22 PM, harish singh <harish.singh22@gmail.com>
>> wrote:
>>> happens on our set up, on 1.8 as well.
>>> we have kept this number to be 10 which seems to work well for us.
>>> On Fri, Mar 24, 2017 at 12:16 PM, Nicholas Hodgkinson <
>>> nik.hodgkinson@collectivehealth.com> wrote:
>>> > So I'm experiencing a problem that I can't figure out; namely my
>>> scheduler
>>> > just stops scheduling tasks for seemingly no reason. I've found this:
>>> > https://bug623317.bugzilla.mozilla.org/show_bug.cgi?id=1286825 which
>>> seems
>>> > to indicate that I should be restarting my scheduler frequently (I
>>> > currently have -n 200 set, which was working fine until recently); is
>>> this
>>> > still the case? And/or is this fixed in 1.8.0 (currently running
>>> 1.7.1.3)?
>>> > Anything to help me diagnose this problem or solutions to it would be
>>> much
>>> > appreciated.
>>> >
>>> > Thanks,
>>> > -Nik
>>> > nik.hodgkinson@collectivehealth.com
>>> >
>>> > --
>>> >
>>> >
>>> > Read our founder's story.
>>> > <https://collectivehealth.com/blog/started-collective-health/>
>>> >
>>> > *This message may contain confidential, proprietary, or protected
>>> > information.  If you are not the intended recipient, you may not review,
>>> > copy, or distribute this message. If you received this message in error,
>>> > please notify the sender by reply email and delete this message.*
>>> >

Mime
View raw message