airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ash Berlin-Taylor <...@apache.org>
Subject [DISCUSS] period_start/period_end instead of execution_date/next_execution_date
Date Tue, 09 Apr 2019 14:07:59 GMT
(trying to break this out in to another thread)

The ML doesn't allow  images, but I can guess that it is the deps section of a task instance
details screen?

I'm not saying it's not clear once you know to look there, but I'm trying remove/reduce the
confusion in the first place. And I think we as committers aren't best placed to know what
makes sense as we have internalised how Airflow works :)

So I guess this is a question to the newest people on the list: Would `period_start` and `period_end`
be more or less confusing for you when you were first getting started with Airflow?

-ash

> On 9 Apr 2019, at 14:47, Driesprong, Fokko <fokko@driesprong.frl> wrote:
> 
> Ash,
> 
> Personally, I think this is quite clear, there is a list of reasons why the job isn't
being scheduled:
> 
> 
> Coming back to the question of Bas, I believe that yesterday_ds does not make sense since
we cannot assume that the schedule is daily. I don't see any usage of this variable. Personally,
I do use next_execution_date quite extensively. When you have a job that runs daily, but you
want to change this to an hourly job. In such a case you don't want to change {{ (execution_date
+ macros.timedelta(days=1)) }} to {{ (execution_date + macros.timedelta(hours=1)) }} everywhere.
> 
> I'm just not sure if the aggressive deprecation of is really worth it. I don't see too
much harm in letting them stay.
> 
> Cheers, Fokko 
> 
> Op di 9 apr. 2019 om 12:17 schreef Ash Berlin-Taylor <ash@apache.org <mailto:ash@apache.org>>:
> To (slightly) hijack this thread:
> 
> On the subject of execuction_date: as I'm sure we're all aware the concept of execution_date
is confusing to new-commers to Airflow (there are many questions about "why hasn't my DAG
run yet"? "Why is my dag a day behind?" etc.) and although we mention this in the docs it's
a confusing concept.
> 
> What to people think about adding two new parameters: `period_start` and `period_end`
and making these the preferred terms in place of execution_date and next_execution_date?
> 
> This hopefully avoids any ambitious terms like "execution" or "run" which is understandably
easy to conflate with the time the task is being run (i.e. `now()`) 
> 
> If people think this naming is better and less confusing I would suggest we update all
the docs and examples to use these terms (but still mention the old names somewhere, probably
in the macros docs)
> 
> Thoughts?
> 
> -ash
> 
> 
> > On 8 Apr 2019, at 16:20, Arthur Wiedmer <arthur.wiedmer@gmail.com <mailto:arthur.wiedmer@gmail.com>>
wrote:
> > 
> > Hi Bas,
> > 
> > 1) I am aware of a few places where those parameters are used in production
> > in a few hundred jobs. I highly recommend we don't deprecate them unless we
> > do it in a major version.
> > 
> > 2) As James mentioned, inlets and outlets are a lineage annotation feature
> > which is still under development. Let's leave them in, but we can guard
> > them behind a feature flag if you prefer.
> > 
> > 3) the yesterday*/tomorrow* params are convenience ones if you use a daily
> > ETL. I agree with you that they are simple to compute, but not everyone
> > using Apache Airflow is amazing with Python. Some users are only trying to
> > get a query scheduled and rely on a couple of niceties like these to get by.
> > 
> > 4) latest_date, end_date (I feel like there used to be start_date, but
> > maybe it got lost) were a blend of things which were used by a backfill
> > framework used internally at Airbnb. Latest date was used if you needed to
> > join to a dimension for which you only wanted the latest version of the
> > attributes in you backfill. end_date was used for time ranges where several
> > days were processed together in a range to save on compute. I don't see an
> > issue with removing them.
> > 
> > Best regards,
> > Arthur
> > 
> > 
> > 
> > On Mon, Apr 8, 2019 at 5:37 AM Bas Harenslak <basharenslak@godatadriven.com <mailto:basharenslak@godatadriven.com>>
> > wrote:
> > 
> >> Hi all,
> >> 
> >> Following Tao Feng’s question to discuss this PR<
> >> https://github.com/apache/airflow/pull/5010 <https://github.com/apache/airflow/pull/5010>>
(AIRFLOW-4192<
> >> https://issues.apache.org/jira/browse/AIRFLOW-4192 <https://issues.apache.org/jira/browse/AIRFLOW-4192>>),
please discuss here
> >> if you agree/disagree/would change.
> >> 
> >> -----------
> >> 
> >> The summary of the PR:
> >> 
> >> I was confused by the task context values and suggest to clean up and
> >> clarify these variables. Some are derivations from other variables, some
> >> are undocumented and unused, some are wrong (name doesn’t match the value).
> >> Please discuss what you think of the removal of these variables:
> >> 
> >> 
> >>  *   Removed yesterday_ds, yesterday_ds_nodash, tomorrow_ds,
> >> tomorrow_ds_nodash. IMO the next_* and previous_* variables are useful
> >> since these require complex logic to compute the next execution date,
> >> however would leave computing the yesterday* and tomorrow* variables up to
> >> the user since they are simple one-liners and don't relate to the DAG
> >> interval.
> >>  *   Removed tables. This is a field in params, and is thus also
> >> accessible by the user ({{ params.tables }}). Also, it was undocumented.
> >>  *   Removed latest_date. It's the same as ds and was also undocumented.
> >>  *   Removed inlets and outlets. Also undocumented, and have the
> >> inlets/outlets ever worked/ever been used by anybody?
> >>  *   Removed end_date and END_DATE. Both have the same value, so it
> >> doesn't make sense to have both variables. Also, the value is ds which
> >> contains the start date of the interval, so the naming didn't make sense to
> >> me. However, if anybody argues in favour of adding "start_date" and
> >> "end_date" to provide the start and end datetime of task instance
> >> intervals, I'd be happy to add them.
> >> 
> >> Cheers,
> >> Bas
> >> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message