airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Montgomery <davidmontgom...@gmail.com>
Subject Re: why is this alwasys running? Should be once an hour
Date Fri, 26 Aug 2016 08:59:16 GMT
I tired this does not work datetime(2016, 8, 23)

I have read the docs.....what is the below in english?  If I add something
 to my cron it justs works.  Why if I place a date to yesterday and its
yesterday, I assume that the next run will be in 5 minutes.  I assume that
the purpose of the start date is for the future runs  and scheduling every
5 mins after.    Is the start_date prior to now for a back fill?  So this
logic for example




   - The first DAG Run is created based on the minimum start_date for the
   tasks in your DAG.
   - Subsequent DAG Runs are created by the scheduler process, based on
   your DAG’sschedule_interval, sequentially.

I will try posting of Stack Overflow.

Thanks

On Fri, Aug 26, 2016 at 4:20 PM, David Montgomery <davidmontgomery@gmail.com
> wrote:

> How?  That is one example in one of the examples
>
> seven_days_ago = datetime.combine(datetime.today() - timedelta(7),
>                                   datetime.min.time())
>
> I tired this...
> mytime = datetime.combine(datetime.now()-timedelta(minutes=5),
>                                   datetime.min.time())
>
> What is an example that will work?
>
> Thnaks
>
> On Fri, Aug 26, 2016 at 4:08 PM, Bolke de Bruin <bdbruin@gmail.com> wrote:
>
>> You are still using a dynamic start_date.
>>
>> Basically you are saying that the start_date is 5 minutes from now every
>> time it is evaluated, so it will always be true:
>>
>> First iteration:
>> Start_date: Current time 2016-01-01 00:05:00 - 5 minutes = 2016-01-01
>> 00:00:00
>>
>> Second iteration
>> Start_date: Current time 2016-01-01 00:05:01 - 5 minutes = 2016-01-01
>> 00:00:01
>>
>> Set your start_date to something absolute, that evaluates to something
>> fixed every time it is evaluated.
>>
>> Bolke
>>
>> > Op 26 aug. 2016, om 10:02 heeft David Montgomery <
>> davidmontgomery@gmail.com> het volgende geschreven:
>> >
>> > I changed to code and justs keeps running.  The start date is 5 minutes
>> ago
>> > and the cron  is set to run every 5 mins.  Ever second the dag is
>> > triggered. Wow what am I missing in the docs?  I have a dag that runs a
>> > python script that outputs to a log file 1,2,3 in order for testing an d
>> > expecting this to happen every 5 mins.  Yet its not.
>> >
>> > All I see in the below once a sec
>> > 1
>> > 2
>> > 3
>> > 1
>> > 2
>> > 3
>> > 1
>> > 2
>> > 3
>> > on and on one line a second or two.
>> >
>> >
>> >
>> > default_args = {
>> >    'owner': 'airflow',
>> >    'depends_on_past': False,
>> >    "start_date":  datetime.now()-timedelta(minutes=5),
>> >    'email': ['test@gmail.com'],
>> >    'email_on_failure': True,
>> >    'email_on_retry': True,
>> >    'retries': 1,
>> >    'retry_delay': timedelta(minutes=5),
>> >    # 'queue': 'bash_queue',
>> >    # 'pool': 'backfill',
>> >    # 'priority_weight': 10,
>> >    # 'end_date': datetime(2016, 1, 1),
>> > }
>> >
>> > # */5 * * * *
>> > dag = DAG('first_test', schedule_interval="*/5 * * * *",
>> > default_args=default_args)
>> >
>> >
>> > node_0 = PythonOperator(
>> >    task_id='isnewdata',
>> >    provide_context=False,
>> >    python_callable=checkfornewdata,
>> >    dag=dag)
>> >
>> >
>> > node_0_1 = PythonOperator(
>> >    task_id='fetchdata',
>> >    provide_context=False,
>> >    python_callable=fetchdata,
>> >    dag=dag)
>> >
>> > node_0_1_2 = PythonOperator(
>> >    task_id='uploadtoes',
>> >    provide_context=False,
>> >    python_callable= uploadtoes,
>> >    dag=dag)
>> >
>> >
>> > node_0_1.set_upstream(node_0)
>> > node_0_1_2.set_upstream(node_0_1)
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Aug 24, 2016 at 11:04 PM, Laura Lorenz <
>> llorenz@industrydive.com>
>> > wrote:
>> >
>> >> I don't think this necessarily answers your question, but one thing I
>> >> noticed is that you are using a dynamic start_date, when you should be
>> >> using a fixed one. From the FAQs
>> >> <https://pythonhosted.org/airflow/faq.html#what-s-the-deal-
>> with-start-date
>> >>> :
>> >>
>> >> We recommend against using dynamic values as start_date, especially
>> >>> datetime.now() as it can be quite confusing. The task is triggered
>> once
>> >> the
>> >>> period closes, and in theory an @hourly DAG would never get to an hour
>> >>> after now as now() moves along.
>> >>
>> >>
>> >> More to the point, what specifically do you mean by "always running"
>> and
>> >> "fires every cycle"? For example is what you are seeing a new task
>> instance
>> >> with a new execution date every run of the scheduler i.e. from the
>> Browse >
>> >> Task Instances UI?
>> >>
>> >> On Tue, Aug 23, 2016 at 5:27 PM, David Montgomery <
>> >> davidmontgomery@gmail.com
>> >>> wrote:
>> >>
>> >>> even @hourly is not working.  Fires every cycle. wow
>> >>>
>> >>> On Wed, Aug 24, 2016 at 5:09 AM, David Montgomery <
>> >>> davidmontgomery@gmail.com
>> >>>> wrote:
>> >>>
>> >>>> I updated the dag.  In thje UI I see 0 * * * * in the schedule field
>> >>>>
>> >>>>
>> >>>>
>> >>>> default_args = {
>> >>>>    'owner': 'airflow',
>> >>>>    'depends_on_past': False,
>> >>>>    "start_date": datetime.now(),
>> >>>>    'email': ['test@gmail.com'],
>> >>>>    'email_on_failure': True,
>> >>>>    'email_on_retry': True,
>> >>>>    'retries': 1,
>> >>>>    'retry_delay': timedelta(minutes=5)
>> >>>> }
>> >>>>
>> >>>>
>> >>>>
>> >>>> dag = DAG('first_test', schedule_interval="0 * * * *",
>> >>>> default_args=default_args)
>> >>>>
>> >>>> node_0 = PythonOperator(
>> >>>>    task_id='isnewdata',
>> >>>>    provide_context=False,
>> >>>>    python_callable=checkfornewdata,
>> >>>>    dag=dag)
>> >>>>
>> >>>>
>> >>>> node_0_1 = PythonOperator(
>> >>>>    task_id='fetchdata',
>> >>>>    provide_context=False,
>> >>>>    python_callable=fetchdata,
>> >>>>    dag=dag)
>> >>>>
>> >>>> node_0_1_2 = PythonOperator(
>> >>>>    task_id='uploadtoes',
>> >>>>    provide_context=False,
>> >>>>    python_callable= uploadtoes,
>> >>>>    dag=dag)
>> >>>>
>> >>>>
>> >>>
>> >>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message