airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Morten Post (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-1063) A manually-created DAG run can prevent a scheduled run to be created
Date Thu, 20 Sep 2018 14:26:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622127#comment-16622127
] 

Morten Post commented on AIRFLOW-1063:
--------------------------------------

I am seeing this issue as well. What backend are you using?

> A manually-created DAG run can prevent a scheduled run to be created
> --------------------------------------------------------------------
>
>                 Key: AIRFLOW-1063
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1063
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: scheduler
>    Affects Versions: 1.7.1.3
>            Reporter: Vitor Baptista
>            Priority: Major
>
> I manually created a DAG Run with the {{execution_date}} as {{2017-03-01 00:00:00}} on
a monthly-recurrent DAG. After a while, I noticed that the scheduled run was never created
and checked the scheduler's logs, finding this traceback:
> {quote}
> Process Process-475397:
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
>     self.run()
>   File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
>     self._target(*self._args, **self._kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 664, in _do_dags
>     dag = dagbag.get_dag(dag.dag_id)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 188, in get_dag
>     orm_dag = DagModel.get_current(root_dag_id)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 2320, in get_current
>     obj = session.query(cls).filter(cls.dag_id == dag_id).first()
>   File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2690, in
first
>     ret = list(self[0:1])
>   File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2482, in
__getitem__
>     return list(res)
>   File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2790, in
__iter__
>     return self._execute_and_instances(context)
>   File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2811, in
_execute_and_instances
>     close_with_result=True)
>   File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2820, in
_get_bind_args
>     **kw
>   File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2802, in
_connection_from_session
>     conn = self.session.connection(**kw)
>   File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 966,
in connection
>     execution_options=execution_options)
>   File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 971,
in _connection_for_bind
>     engine, execution_options)
>   File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 382,
in _connection_for_bind
>     self._assert_active()
>   File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 276,
in _assert_active
>     % self._rollback_exception
> InvalidRequestError: This Session's transaction has been rolled back due to a previous
exception during flush. To begin a new transaction with this Session, first issue Session.rollback().
Original exception was: (psycopg2.IntegrityError)
>  duplicate key value violates unique constraint "dag_run_dag_id_execution_date_key"
> DETAIL:  Key (dag_id, execution_date)=(nct, 2017-03-01 00:00:00) already exists.
>  [SQL: 'INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, state, run_id,
external_trigger, conf) VALUES (%(dag_id)s, %(execution_date)s, %(start_date)s, %(end_date)s,
%(state)s, %(run_id)s, %(external_trigger)s, %(conf)s)
>  RETURNING dag_run.id'] [parameters: {'end_date': None, 'run_id': u'scheduled__2017-03-01T00:00:00',
'execution_date': datetime.datetime(2017, 3, 1, 0, 0), 'external_trigger': False, 'state':
u'running', 'conf': None, 'start_date': dateti
> me.datetime(2017, 4, 3, 13, 48, 39, 168456), 'dag_id': 'nct'}]
> {quote}
> The problem is that the {{dag_runs}} table require the {{(dag_id, execution_date)}} pair
to be unique, so the scheduler was stuck in a loop where it tried creating a new scheduled
dag run but failed, as I had already created one on the same {{execution_date}}. This was
surprising. As a user, I would expect that it would either schedule the run normally, even
if there's a manual one on the same date, or maybe it would skip that execution date.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message