airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Fei <cfe...@gmail.com>
Subject Re: regression from a month-old aitflow version
Date Sat, 06 May 2017 16:47:40 GMT
I wonder if your issue is the same root cause as AIRFLOW-1013[1] (which
you seem to have reported) and AIRFLOW-1055[2]. I haven't tried it
myself, but that second ticket seems to indicate that a workaround
could be setting catchup = True on your DAG. Not sure if that's an
option for you.
On Sat, May 6, 2017, at 12:29 PM, Ruslan Dautkhanov wrote:
> I've upgraded Airflow to today's master branch.
> 
> Got following regression in attempt to start a DAG:
> 
> Process DagFileProcessor209-Process:
>> Traceback (most recent call last):
>> File
>> "/opt/cloudera/parcels/Anaconda/lib/python2.7/multiprocessing/proce-
>> ss.py",>> line 258, in _bootstrap
>>   self.run()
>> File
>> "/opt/cloudera/parcels/Anaconda/lib/python2.7/multiprocessing/proce-
>> ss.py",>> line 114, in run
>>   self._target(*self._args, **self._kwargs)
>> File "/opt/airflow/airflow-
>> 20170506/src/airflow/airflow/jobs.py", line>> 346, in helper
>>   pickle_dags)
>> File "/opt/airflow/airflow-20170506/src/airflow/airflow/utils/db.py",>> line
48, in wrapper
>>   result = func(*args, **kwargs)
>> File "/opt/airflow/airflow-
>> 20170506/src/airflow/airflow/jobs.py", line>> 1584, in process_file
>>   self._process_dags(dagbag, dags, ti_keys_to_schedule)
>> File "/opt/airflow/airflow-
>> 20170506/src/airflow/airflow/jobs.py", line>> 1173, in _process_dags
>>   dag_run = self.create_dag_run(dag)
>> File "/opt/airflow/airflow-20170506/src/airflow/airflow/utils/db.py",>> line
48, in wrapper
>>   result = func(*args, **kwargs)
>> File "/opt/airflow/airflow-
>> 20170506/src/airflow/airflow/jobs.py", line>> 776, in create_dag_run
>>   if next_start <= now:
>> TypeError: can't compare datetime.datetime to NoneType
> 
> 
> 
> DAG definition:
> 
> main_dag = DAG(
>>   dag_id                         = 'DISCOVER-Oracle-Load-Mar2017-v1',>>   default_args
                  = default_args,                  #
>>   dafeult operators' arguments - see above>>   user_defined_macros        
   = dag_macros,       # I do not get
>>   different between>>   ## params                         = dag_macros,   
   #
>>   ## user_defined_macros and params>>   #
>>   start_date                     = datetime.now(),                #
>>   or e.g. datetime(2015, 6, 1)>>   # 'end_date'                   = datetime(2016,
1, 1),
>>   catchup                        = False,                         #
>>   Perform scheduler catchup (or only run latest)?>>                       
                                               # -
defaults to True>>   schedule_interval              = '@once',                     
 #
>>   '@once'=None?>>                                                         
           #
doesn't create multiple dag runs automatically>>   concurrency                    =
3,                             #
>>   task instances allowed to run concurrently>>   max_active_runs          
     = 1,                             #
>>   only one DAG run at a time>>   dagrun_timeout                 = timedelta(days=4),
            #
>>   no way this dag should ran for 4 days>>   orientation                   
= 'TB',                          #
>>   default graph view>> )
> 
> 
> default_args:
> 
> default_args = {
>>   # Security:
>>   'owner'                        : 'rdautkha',                    #
>>   owner of the task, using the unix username is recommended>>   # 'run_as_user'
               : None                           #
>>   # unix username to impersonate while running the task>>   # Scheduling:
>>   'start_date'                   : None,                          #
>>   don't confuse with DAG's start_date>>   'depends_on_past'              :
False,                         #
>>   True makes sense... but there are bugs around that code>>   'wait_for_downstream'
         : False,                         #
>>   depends_on_past is forced to True if wait_for_downstream>>   'trigger_rule'
                : 'all_success',                 #
>>   all_succcess is default anyway>>   # Retries
>>   'retries'                      : 0,                             #
>>   No retries>>   # 'retry_delay'                : timedelta(minutes=5),   
      #
>>   # check retry_exponential_backoff and max_retry_delay too>>   # Timeouts
and SLAs
>>   # 'sla'                        : timedelta(hours=1),            #
>>   # default tasks' sla - normally don't run longer>>   'execution_timeout'
           : timedelta(hours=3),            #
>>   no single task runs 3 hours or more>>   # 'sla_miss_callback'           
                               # -
>>   # function to call when reporting SLA timeouts>>   # Notifications:
>>   'email'                        : ['rdautkhanov@epsilon.com'],
>>   'email_on_failure'             : True,
>>   'email_on_retry'               : True,
>>   # Resource usage:
>>   'pool'                         : 'DISCOVER-Prod',               #
>>   can increase this pool's concurrency>>   # 'queue'                      :
'some_queue',
>>   # 'priority_weight'            : 10,
>>   # Miscellaneous:
>>   # on_failure_callback=None, on_success_callback=None,
>>   # on_retry_callback=None>> }
> 
> 
> The DAG itself has a bunch of Oracle operators.
> 
> Any ideas?
> 
> That's a regression from a month old Airflow.
> No changes in DAG.
> 
> 
> 
> Thank you,
> Ruslan Dautkhanov


Links:

  1. https://issues.apache.org/jira/browse/AIRFLOW-1013
  2. https://issues.apache.org/jira/browse/AIRFLOW-1055

Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message