airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "nathan warshauer (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AIRFLOW-1868) Packaged Dags not added to dag table, unable to execute tasks
Date Wed, 29 Nov 2017 22:48:01 GMT
nathan warshauer created AIRFLOW-1868:
-----------------------------------------

             Summary: Packaged Dags not added to dag table, unable to execute tasks
                 Key: AIRFLOW-1868
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1868
             Project: Apache Airflow
          Issue Type: Bug
         Environment: airflow 1.8.2, celery, rabbitMQ, mySQL, aws
            Reporter: nathan warshauer
         Attachments: Screen Shot 2017-11-29 at 2.31.02 PM.png, Screen Shot 2017-11-29 at
4.40.39 PM.png, Screen Shot 2017-11-29 at 4.42.39 PM.png

.zip files in the dag directory do not appear to be getting added to the dag table on the
airflow database.  When a .zip file is placed within the dags folder and it contains executable
.py files, the dag_id should be added to the dag table and airflow should allow the dag to
be unpaused and run through the web server.
SELECT distinct dag.dag_id AS dag_dag_id FROM dag confirms the dag does not exist in the dags
table but shows up on the UI with the warning message "This Dag seems to be existing only
locally" however the dag exists in all 3 dag directories (master and two workers) and the
airflow.cfg has donot_pickle = True
When the dag is triggered manually via airflow trigger_dag <dag_id> the process goes
to the web server and does not execute any tasks.  When I go to the task and click start through
the UI the task will execute successfully and shows the attached state upon completion.  When
I do not do this process the tasks will not enter the queue and the run sits idle as the 3rd
attached image shows.
Basically, the dag CAN run manually from the zip BUT the scheduler and underlying database
tables appear to not be functioning correctly for packaged dags.
Please let me know if I can provide any additional information regarding this issue, or if
you all have any leads that I can check out for resolving this.

dag = DAG('MY-DAG-NAME', 
  default_args=default_args, 
  schedule_interval='*/5 * * * *',
  max_active_runs=1,
  dagrun_timeout=timedelta(minutes=4, seconds=30))

default_args = {
  'depends_on_past': False,
  'email': ['airflow@airflow.com'],
  'email_on_failure': True,
  'email_on_retry': False,
  'owner': 'airflow',
  'provide_context': True,
  'retries': 0,
  'retry_delay': timedelta(minutes=5),
  'start_date': datetime(2017,11,28)
}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message