airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kaxil Naik <kaxiln...@gmail.com>
Subject Re: execution_date - can we stop the confusion?
Date Wed, 26 Sep 2018 19:42:31 GMT
This has been clearly documented as Bolke stated. It is an integral part of
Airflow and a user learning Airflow needs to learn this. If you think it in
an ETL perspective it completely makes sense. Also, if you can use you real
name than "airflowuser" would be good, your preference though.

Also, bear in mind that there is always a learning curve for any project or
tool. A user should take time to learn the tool and understand it properly.

On Wed, 26 Sep 2018, 16:34 Maxime Beauchemin, <maximebeauchemin@gmail.com>
wrote:

> I think if you have a functional mindset (as in "functional data
> engineering
> <
> https://medium.com/@maximebeauchemin/functional-data-engineering-a-modern-paradigm-for-batch-data-processing-2327ec32c42a
> >")
> as opposed to a cron mindset, using the left bound of the time interval
> makes a lot of sense. Things like your daily table partition keys align
> with your Airflow execution_date.
>
> The main thing is that whatever we do we cannot break backwards
> compatibility. Offering both views (left bound/right bound), as it's been
> proposed before, either as an environment setting or a user personal
> preference is even more confusing to me personally. Users would have to
> switch context as they help each other or change environments.
>
> Also note that your intuition may differ from other people's intuition, and
> that "unlearning" something is way harder than learning something.
>
> My personal take on this is to make this a rite of passage. This is just
> one of the many thing you have to learn when learning Airflow.
>
> Max
>
> On Wed, Sep 26, 2018 at 8:18 AM Sam Elamin <hussam.elamin@gmail.com>
> wrote:
>
> > Hi Bolke
> >
> > Speaking as a consultant who is constantly training other teams how to
> use
> > airflow, I do frequently see this confusion.
> > Another one is how the batch_date is always batch_date + interval or as
> the
> > docs make it quite clear
> >
> > "*Let’s Repeat That* The scheduler runs your job one schedule_interval
> > AFTER
> > the start date, at the END of the period."
> >
> > Renaming it would make it simpler for newbies, but essentially they will
> > need to understand how Airflow behaves, execution_date being the batch
> > execution date not the run_date of the DAG
> >
> > I am actually in the process of writing a blog post
> > <https://samelamin.github.io/2017/04/27/Building-A-Datapipeline-part1/>
> > about this which I could use peoples feedback
> >
> > If it helps, I find that explaining how backfills work and why they are
> > important will drive home what the execution_date is :)
> >
> >
> > Regards
> > Sam
> >
> >
> >
> > On Wed, Sep 26, 2018 at 4:10 PM Bolke de Bruin <bdbruin@gmail.com>
> wrote:
> >
> > > I dont think this makes sense and I dont that think anyone had a real
> > > issue with this. Execution date has been clearly documented  and is
> part
> > of
> > > the core principles of airflow. Renaming will create more confusion.
> > >
> > > Please note that I do think that as an anonymous user you cannot speak
> > for
> > > any "new airflow user". That is a contradiction to me.
> > >
> > > Thanks
> > > Bolke
> > >
> > > Sent from my iPhone
> > >
> > > > On 26 Sep 2018, at 07:59, airflowuser <airflowuser@protonmail.com
> > .INVALID>
> > > wrote:
> > > >
> > > > One of the most annoying, hard to understand and against all common
> > > sense is the execution_date behavior. I assume that any new Airflow
> user
> > > has been struggling with it.
> > > > The amount of questions with answers referring to :
> > > https://airflow.apache.org/scheduler.html?scheduling-triggers  is
> > > uncountable.
> > > >
> > > > Most people mistakenly think that execution_date is the datetime
> which
> > > the DAG started to run.
> > > >
> > > > I suggest the following changes:
> > > > 1. Renaming the execution_date to something else like: run_stamped
> > >  This name won't cause people to get confused.
> > > > 2. Adding a new variable which indicated the actual datetime when the
> > > DAG run was generated. call it execution_start_date. People seem to
> want
> > > the information when the DAG actually started to be executed/run.
> > > >
> > > > This is only naming changes. No need to actual change the behavior -
> > > This will only make things simpler as when user encounter  run_stamped
> > he
> > > won't be confused by the name like execution_date
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message