airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <bdbr...@gmail.com>
Subject Re: Identifying delay between schedule & run instances
Date Thu, 09 Aug 2018 06:27:30 GMT
Hi vardang,

What do you intent to gain from this metric? There are many influences that influence a difference
between execution date and start date. You named one of them, but there are also functional
ones (limits reached etc). We are not a real time system so we never really purposefully aimed
for lowering a difference because.

B.

Verstuurd vanaf mijn iPad

> Op 9 aug. 2018 om 08:04 heeft vardanguptacse@gmail.com <vardanguptacse@gmail.com>
het volgende geschreven:
> 
> 
> 
>> On 2018/08/06 07:07:05, vardanguptacse@gmail.com <vardanguptacse@gmail.com>
wrote: 
>> Hi Everyone,
>> 
>> We just wanted to calculate a metric which can talk about what's the delay(if any)
between DAG getting active in scheduler & server and then tasks of DAG actually getting
kicked off (let's suppose start_date was of 1 hour earlier and schedule was every 10 minutes).
>> 
>> Currently task_instance table has execution_date, start_date, end_date & queued_dttm,
we can easily get this metric from the difference of start_date  & execution_date but
in case of back fill, execution_date will be of previous schedule occurrence and difference
of start_date & execution_date will be skewed, though it will be okay for any future runs
to get the delay in scheduling but for back fills, this number won't be trustworthy, any suggestions
how to smartly identify this metric, may be by knowing somehow back fill details? Even in
DAG table, there is no create_date & update_date notion which can tell me when this DAG
was originally brought to existence?
>> 
>> 
>> Regards,
>> Vardan Gupta
>> 
> Can someone look at the issue?

Mime
View raw message