airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxime Beauchemin <maximebeauche...@gmail.com>
Subject Re: problem to update 'start_date' of DAG
Date Wed, 10 Aug 2016 19:38:38 GMT
Hi,

What do you mean by "discarding", what is the outcome you are after?

If what you want is a DagRun that matches your start_date you can do that
from the UI (create a new DagRun that matches your desired start_date, that
essentially "re-seeds" the point from which the future DagRuns will get
created). You may also want to deactivate older `running` DagRuns as well,
which you can also do from the UI.

Max

On Tue, Aug 9, 2016 at 9:24 PM, הילה ויזן <hilaviz@gmail.com> wrote:

> Hi Maxime,
>
> Thanks for the clarifications.
> I've already read this page while trying to find a solution to my problem.
>
> But I still have the question - is there any way to discard the previous
> definitions? (for example the 'start_date' of a DAG)
>
> Thanks
>
> On Wed, Aug 10, 2016 at 1:37 AM, Maxime Beauchemin <
> maximebeauchemin@gmail.com> wrote:
>
> > From http://pythonhosted.org/airflow/faq.html:
> >
> > *What’s the deal with ``start_date``?*
> >
> > start_date is partly legacy from the pre-DagRun era, but it is still
> > relevant in many ways. When creating a new DAG, you probably want to set
> a
> > global start_date for your tasks usingdefault_args. The first DagRun to
> be
> > created will be based on the min(start_date) for all your task. From that
> > point on, the scheduler creates new DagRuns based on your
> > schedule_interval and
> > the corresponding task instances run as your dependencies are met. When
> > introducing new tasks to your DAG, you need to pay special attention to
> > start_date, and may want to reactivate inactive DagRuns to get the new
> task
> > to get onboarded properly.
> >
> > We recommend against using dynamic values as start_date, especially
> > datetime.now() as it can be quite confusing. The task is triggered once
> the
> > period closes, and in theory an @hourly DAG would never get to an hour
> > after now as now() moves along.
> >
> > Previously we also recommended using rounded start_date in relation to
> your
> > schedule_interval. This meant an @hourly would be at 00:00
> minutes:seconds,
> > a @daily job at midnight, a @monthlyjob on the first of the month. This
> is
> > no longer required. Airflow will not auto align the start_dateand the
> > schedule_interval, by using the start_date as the moment to start
> looking.
> >
> > You can use any sensor or a TimeDeltaSensor to delay the execution of
> tasks
> > within the schedule interval. While schedule_interval does allow
> specifying
> > a datetime.timedelta object, we recommend using the macros or cron
> > expressions instead, as it enforces this idea of rounded schedules.
> >
> > When using depends_on_past=True it’s important to pay special attention
> to
> > start_date as the past dependency is not enforced only on the specific
> > schedule of the start_date specified for the task. It’ also important to
> > watch DagRun activity status in time when introducing new
> > depends_on_past=True, unless you are planning on running a backfill for
> the
> > new task(s).
> >
> > Also important to note is that the tasks start_date, in the context of a
> > backfill CLI command, get overridden by the backfill’s command
> start_date.
> > This allows for a backfill on tasks that havedepends_on_past=True to
> > actually start, if it wasn’t the case, the backfill just wouldn’t start.
> >
> > On Tue, Aug 9, 2016 at 7:44 AM, הילה ויזן <hilaviz@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > We're experiencing a strange problem with the start_date configuration
> in
> > > Airflow.
> > >
> > > When we first ran the DAGs, we defined the start_date as
> > 'datetime.now()',
> > > which at the time was 01/08/2016. This worked fine. A week afterwards,
> we
> > > changed the DAGs to a specific newer date - 08/08/2016, and reset all
> of
> > > the tasks. After resetting the Airflow and all of the DAGs *we are
> still
> > > seeing the tasks running from original date (01/08)*. Why is this
> > > happening?
> > >
> > > We don't understand why the tasks are still using the old date. Is
> there
> > a
> > > cache/DB/persistent file that the DAG reads on startup that overrides
> our
> > > definition? Is it maybe Celery? We really would appreciate your input
> > > because we are totally stuck.
> > >
> > > We use airflow version 1.7.1.3 with postgress as the backend DB.
> > > In addition, we run in CeleryExecutor mode with rabbitMQ as Celery
> > backend.
> > >
> > > Thank you,
> > > Hila
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message