airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Capwell <dcapw...@gmail.com>
Subject Re: Running dag run doesn't schedule task
Date Tue, 20 Mar 2018 03:21:47 GMT
Current theory is priority, this dag is high fanout where as other DAGs are
much deeper.  Looking at the scheduler code and database this looks hard to
prove since I only see logs for success and DB doesn't disguise between
runnable and not scheduled; is there a good way to check schedule delay?

On Mon, Mar 19, 2018, 6:15 PM David Capwell <dcapwell@gmail.com> wrote:

> Ignore that, must be something with splunk since stdiut doesn't have a
> date field; the same process writing to a file is printing that out and
> Filling is before that line...
>
> On Mon, Mar 19, 2018, 5:35 PM David Capwell <dcapwell@gmail.com> wrote:
>
>> This is weird and hope not bad utc conversion tricking me....
>>
>>
>> So splunk logs for worker shows the process logs were created at 9am
>> ("Logging into: ...."), the first entry of the log was at 14:00 ("Filling
>> up the DagBag").  If I go to the DB and calculate queue time this specific
>> dag was delayed 5 hours which matches the logs...
>>
>>
>>
>> On Mon, Mar 19, 2018, 9:10 AM David Capwell <dcapwell@gmail.com> wrote:
>>
>>> The major reason we have been waiting was mostly because 1.8.2 and 1.9
>>> are backwards incompatible (don't remember off the top of my head but one
>>> operator broke important so everything failed for us), so neglected doing
>>> the work to support both versions (need to support both since different
>>> teams move at different rates).
>>>
>>> We need to do this anyways (frozen in time is very bad).
>>>
>>> On Mon, Mar 19, 2018, 1:47 AM Driesprong, Fokko <fokko@driesprong.frl>
>>> wrote:
>>>
>>>> Hi David,
>>>>
>>>> First I would update to Apache Airflow 1.9.0, there have been a lot of
>>>> fixes between 1.8.2 and 1.9.0. Just to see if the bug is still in there.
>>>>
>>>> Cheers, Fokko
>>>>
>>>> 2018-03-18 19:41 GMT+01:00 David Capwell <dcapwell@gmail.com>:
>>>>
>>>> > Thanks for the reply
>>>> >
>>>> > Our script doesn't set it so should be off; the process does not
>>>> normally
>>>> > restart (monitoring has a counter for number of restarts since deploy,
>>>> > currently as 0)
>>>> >
>>>> > At the point in time the UI showed the upstream tasks as green
>>>> (success);
>>>> > we manually ran tasks so no longer in the same state, so can't check
>>>> UI
>>>> > right now
>>>> >
>>>> > On Sun, Mar 18, 2018, 11:34 AM Bolke de Bruin <bdbruin@gmail.com>
>>>> wrote:
>>>> >
>>>> > > Are you running with num_runs? If so disable it. We have seen this
>>>> > > behavior with num_runs. Also you can find out by clicking on the
>>>> task if
>>>> > > there is a dependency issue.
>>>> > >
>>>> > > B.
>>>> > >
>>>> > > Verstuurd vanaf mijn iPad
>>>> > >
>>>> > > > Op 18 mrt. 2018 om 19:08 heeft David Capwell <dcapwell@gmail.com>
>>>> het
>>>> > > volgende geschreven:
>>>> > > >
>>>> > > > We just started seeing this a few days ago after turning on
SLA
>>>> for our
>>>> > > > tasks (not saying SLA did this, may have been happening before
>>>> and not
>>>> > > > noticing), but we have a dag that runs once a hour and we
see
>>>> that 4-5
>>>> > > dag
>>>> > > > runs are marked running but tasks are not getting scheduled.
>>>> When we
>>>> > get
>>>> > > > the SLA alert the action we are doing right now is going to
the
>>>> UI and
>>>> > > > clicking run on tasks manually; this is only needed for the
>>>> oldest dag
>>>> > > run
>>>> > > > and the rest recover after that. In the past 3 days this has
>>>> happened
>>>> > > twice
>>>> > > > to us.
>>>> > > >
>>>> > > > We are running 1.8.2, are there any known jira about this?
Don't
>>>> know
>>>> > > > scheduler well, what could I do to see why these tasks are
getting
>>>> > > skipped
>>>> > > > without manual intervention?
>>>> > > >
>>>> > > > Thanks for your time.
>>>> > >
>>>> >
>>>>
>>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message