airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [airflow] ashb commented on a change in pull request #6295: [AIRFLOW-XXX] Adding Task re-run documentation
Date Fri, 11 Oct 2019 10:21:43 GMT
ashb commented on a change in pull request #6295: [AIRFLOW-XXX] Adding Task re-run documentation
URL: https://github.com/apache/airflow/pull/6295#discussion_r333920466
 
 

 ##########
 File path: docs/dag-run.rst
 ##########
 @@ -0,0 +1,193 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+DAG Runs
+=========
+A DAG Run is an object representing an instantiation of the DAG in time.
+
+Each DAG may or may not have a schedule, which informs how ``DAG Runs`` are
+created. ``schedule_interval`` is defined as a DAG arguments, and receives
+preferably a
+`cron expression <https://en.wikipedia.org/wiki/Cron#CRON_expression>`_ as
+a ``str``, or a ``datetime.timedelta`` object. Alternatively, you can also
+use one of these cron "preset":
+
++--------------+----------------------------------------------------------------+---------------+
+| preset       | meaning                                                        | cron  
       |
++==============+================================================================+===============+
+| ``None``     | Don't schedule, use for exclusively "externally triggered"     |       
       |
+|              | DAGs                                                           |       
       |
++--------------+----------------------------------------------------------------+---------------+
+| ``@once``    | Schedule once and only once                                    |       
       |
++--------------+----------------------------------------------------------------+---------------+
+| ``@hourly``  | Run once an hour at the beginning of the hour                  | ``0 * *
* *`` |
++--------------+----------------------------------------------------------------+---------------+
+| ``@daily``   | Run once a day at midnight                                     | ``0 0 *
* *`` |
++--------------+----------------------------------------------------------------+---------------+
+| ``@weekly``  | Run once a week at midnight on Sunday morning                  | ``0 0 *
* 0`` |
++--------------+----------------------------------------------------------------+---------------+
+| ``@monthly`` | Run once a month at midnight of the first day of the month     | ``0 0 1
* *`` |
++--------------+----------------------------------------------------------------+---------------+
+| ``@yearly``  | Run once a year at midnight of January 1                       | ``0 0 1
1 *`` |
++--------------+----------------------------------------------------------------+---------------+
+
+Your DAG will be instantiated for each schedule along with a corresponding 
+``DAG Run`` entry in backend.
+
+**Note**: If you run a DAG on a schedule_interval of one day, the run stamped 2020-01-01

+will be triggered soon after 2020-01-01T23:59. In other words, the job instance is 
+started once the period it covers has ended.  The execution_date passed in the dag 
+will also be 2020-01-01.
+
+The first ``DAG Run`` is created based on the minimum ``start_date`` for the tasks in
your DAG. 
+Subsequent ``DAG Runs`` are created by the scheduler process, based on your DAG’s ``schedule_interval``,

+sequentially. If your start_date is 2020-01-01 and schedule_interval is @daily the first
run 
+will be created on 2020-01-02 i.e. after your start date has passed.
+
+Re-run DAG
+''''''''''
+There can be cases where you will want to execute your DAG again. One such case is when the
scheduled
+DAG run fails. Another can be the scheduled DAG run wasn't executed due to low resources
or the DAG being turned off.
+
+Catchup
+-------
+
+An Airflow DAG with a ``start_date``, possibly an ``end_date``, and a ``schedule_interval`` defines
a 
+series of intervals which the scheduler turn into individual DAG Runs and execute. A key
capability 
+of Airflow is that these DAG Runs are atomic and idempotent items. The scheduler, by default,
will
+kick off a DAG Run for any interval that has not been run (or has been cleared). This concept
is called Catchup.
 
 Review comment:
   This is mostly true, but slightly misleading in a few edge cases. I'm not sure if this
is worth mentioning here or not.
   
   Catchup, and the scheduler in general will not "fill in gaps" - it will only look forward
from the most recent dag run.
   
   For example:
   
   - I have a daily dag with catchup=False running. This is "d0"
   - I pause that dag for 3 days
   - I then start it again.
   
   At this point we have dagruns for d0, d4, d5, ...
   
   - I edit the dag to set catchup=True
   
   The scheduler will not go and "fill in" d1, d2, d3.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message