airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ash Berlin-Taylor <...@apache.org>
Subject Re: schedule_interval question
Date Thu, 18 Apr 2019 12:03:32 GMT
Do not set start_date to now. That will _always_ be wrong. https://airflow.apache.org/faq.html#what-s-the-deal-with-start-date

> On 18 Apr 2019, at 12:13, Pawel Bartoszek <pawel.bartoszek.bbc@gmail.com> wrote:
> 
> Hi,
> 
> When I set start_date to datetime.now() ie
> 
> DAG(
>        dag_id="dag",
>        start_date=datetime.now(),
>        schedule_interval="0 2 * * *",
>        default_view="graph",
>        orientation="TB",
>        concurrency=1,
>        max_active_runs=1,
>        catchup=False
> )
> 
> I get following info in task instance details
> 
> DependencyReason
> Execution Date The execution date is 2019-04-18T11:09:16.193396+00:00 but
> this is before the task's start date 2019-04-18T11:10:42.607861+00:00.
> Execution Date The execution date is 2019-04-18T11:09:16.193396+00:00 but
> this is before the task's DAG's start date 2019-04-18T11:10:42.607861+00:00.
> Dagrun Running Task instance's dagrun did not exist: Unknown reason.
> 
> I though execution date should be set to 2019-04-19 02:00 ?
> 
> 
> On Wed, Apr 17, 2019 at 8:37 PM Chao-Han Tsai <milton0825@gmail.com> wrote:
> 
>> Hi Pawel,
>> 
>> I think you can change the start_date to later dates to avoid the DagRun of
>> 2019-04-16 02:00 being scheduled.
>> 
>> Chao-Han
>> 
>> On Wed, Apr 17, 2019 at 10:13 AM Pawel Bartoszek <
>> pawel.bartoszek.bbc@gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> Let's say I deploy the following DAG at 2019-04-17 5 PM
>>> 
>>> DAG(
>>>        dag_id="dag",
>>>        start_date=datetime(year=2018, month=1, day=1, hour=2, minute=0),
>>>        schedule_interval="0 2 * * *,
>>>        default_view="graph",
>>>        orientation="TB",
>>>        concurrency=1,
>>>        max_active_runs=1,
>>>        catchup=False)
>>> 
>>> 
>>> I noticed that DAG will be first scheduled for yesterday ie 2019-04-16 2
>>> AM. How can I avoid this? I want the DAG to be scheduled in the future
>>> according to the cron expression ie 2019-04-18 2 AM.
>>> 
>>> Setting schedule_interval as
>>> 
>>> schedule_interval=timedelta(hours=24),
>>> 
>>> correct me if I am wrong but Airflow seems to schedule DAG 24 hours in
>> the
>>> past from the time DAG was deployed.
>>> 
>>> Thanks,
>>> Pawel
>>> 
>> 
>> 
>> --
>> 
>> Chao-Han Tsai
>> 


Mime
View raw message