airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pawel Bartoszek <pawel.bartoszek....@gmail.com>
Subject Re: schedule_interval question
Date Thu, 18 Apr 2019 12:54:55 GMT
Ash, If I omit start_date it I get the error
Task is missing the start_date parameter

What should I set it to then?

On Thu, Apr 18, 2019 at 1:03 PM Ash Berlin-Taylor <ash@apache.org> wrote:

> Do not set start_date to now. That will _always_ be wrong.
> https://airflow.apache.org/faq.html#what-s-the-deal-with-start-date
>
> > On 18 Apr 2019, at 12:13, Pawel Bartoszek <pawel.bartoszek.bbc@gmail.com>
> wrote:
> >
> > Hi,
> >
> > When I set start_date to datetime.now() ie
> >
> > DAG(
> >        dag_id="dag",
> >        start_date=datetime.now(),
> >        schedule_interval="0 2 * * *",
> >        default_view="graph",
> >        orientation="TB",
> >        concurrency=1,
> >        max_active_runs=1,
> >        catchup=False
> > )
> >
> > I get following info in task instance details
> >
> > DependencyReason
> > Execution Date The execution date is 2019-04-18T11:09:16.193396+00:00 but
> > this is before the task's start date 2019-04-18T11:10:42.607861+00:00.
> > Execution Date The execution date is 2019-04-18T11:09:16.193396+00:00 but
> > this is before the task's DAG's start date
> 2019-04-18T11:10:42.607861+00:00.
> > Dagrun Running Task instance's dagrun did not exist: Unknown reason.
> >
> > I though execution date should be set to 2019-04-19 02:00 ?
> >
> >
> > On Wed, Apr 17, 2019 at 8:37 PM Chao-Han Tsai <milton0825@gmail.com>
> wrote:
> >
> >> Hi Pawel,
> >>
> >> I think you can change the start_date to later dates to avoid the
> DagRun of
> >> 2019-04-16 02:00 being scheduled.
> >>
> >> Chao-Han
> >>
> >> On Wed, Apr 17, 2019 at 10:13 AM Pawel Bartoszek <
> >> pawel.bartoszek.bbc@gmail.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> Let's say I deploy the following DAG at 2019-04-17 5 PM
> >>>
> >>> DAG(
> >>>        dag_id="dag",
> >>>        start_date=datetime(year=2018, month=1, day=1, hour=2,
> minute=0),
> >>>        schedule_interval="0 2 * * *,
> >>>        default_view="graph",
> >>>        orientation="TB",
> >>>        concurrency=1,
> >>>        max_active_runs=1,
> >>>        catchup=False)
> >>>
> >>>
> >>> I noticed that DAG will be first scheduled for yesterday ie 2019-04-16
> 2
> >>> AM. How can I avoid this? I want the DAG to be scheduled in the future
> >>> according to the cron expression ie 2019-04-18 2 AM.
> >>>
> >>> Setting schedule_interval as
> >>>
> >>> schedule_interval=timedelta(hours=24),
> >>>
> >>> correct me if I am wrong but Airflow seems to schedule DAG 24 hours in
> >> the
> >>> past from the time DAG was deployed.
> >>>
> >>> Thanks,
> >>> Pawel
> >>>
> >>
> >>
> >> --
> >>
> >> Chao-Han Tsai
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message