airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-92) Tasks not being retried at all due to a 'obj not bound to a Session' exception
Date Tue, 10 May 2016 13:13:13 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278062#comment-15278062
] 

ASF subversion and git services commented on AIRFLOW-92:
--------------------------------------------------------

Commit dddfd3b5bf2cabaac6eec123dfa3cb59e73a56f5 in incubator-airflow's branch refs/heads/master
from [~bolke]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=dddfd3b ]

AIRFLOW-92 Avoid unneeded upstream_failed session closes apache/incubator-airflow#1485


> Tasks not being retried at all due to a 'obj not bound to a Session' exception
> ------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-92
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-92
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: Airflow 1.7.0
>         Environment: EC2 t2.medium instance, 
> Docker `version 1.11.1, build 5604cbe`,
> Host is `Linux ip-172-31-44-140 3.13.0-85-generic #129-Ubuntu SMP Thu Mar 17 20:50:15
UTC 2016 x86_64 x86_64 x86_64 GNU/Linux`,
> Docker containers are built upon the `python:3.5` image, 
> LocalExecutor is used with two scheduler containers running
>            Reporter: Bence Nagy
>            Priority: Critical
>
> I have some tasks that are stuck in {{up_for_retry}} state, below is an extract from
the database. (here it is in a [Google Drive spreadsheet|https://docs.google.com/spreadsheets/d/14dtb3zYa583V1SaLcpOq6hDM4ThCeN7JhHjftRwKxbI/edit?usp=sharing]
with better formatting)
> {code}
> task_id	dag_id	execution_date	start_date	end_date	duration	state	try_number	hostname
unixname	job_id	pool	queue	priority_weight	operator	queued_dttm	id	dag_id	state	job_type	start_date
end_date	latest_heartbeat	executor_class	hostname	unixname	id	dag_id	execution_date	state
run_id	external_trigger	conf	end_date	start_date
> task_a	dag_a1	2016-05-09 08:00:00.000000	2016-05-09 12:00:12.382775	2016-05-09 12:01:12.473914
60.091139	up_for_retry	1	d5593c115c22	root	46266		default	4	ExternalTaskSensor		46266		success
LocalTaskJob	2016-05-09 12:00:08.195711	2016-05-09 12:01:13.261937	2016-05-09 12:00:08.195732
LocalExecutor	d5593c115c22	root	17799	dag_a1	2016-05-09 08:00:00.000000	failed	scheduled__2016-05-09T08:00:00
false			2016-05-09 12:00:04.406875
> task_a	dag_a2	2016-05-09 10:00:00.000000	2016-05-09 12:00:13.102094	2016-05-09 12:01:13.185960
60.083866	up_for_retry	1	d5593c115c22	root	46270		default	4	ExternalTaskSensor		46270		success
LocalTaskJob	2016-05-09 12:00:08.896527	2016-05-09 12:01:13.960936	2016-05-09 12:00:08.896550
LocalExecutor	d5593c115c22	root	17800	dag_a2	2016-05-09 10:00:00.000000	failed	scheduled__2016-05-09T10:00:00
false			2016-05-09 12:00:04.531888
> task_b	dag_b	2016-04-07 18:00:00.000000	2016-05-09 12:53:59.990395	2016-05-09 12:54:00.393259
0.402864	up_for_retry	1	0a8613c2b5d2	root	46366		default	1	PostgresOperator		46366		success
LocalTaskJob	2016-05-09 12:53:58.881987	2016-05-09 12:54:03.891450	2016-05-09 12:53:58.882006
LocalExecutor	0a8613c2b5d2	root	17836	dag_b	2016-04-07 18:00:00.000000	running	scheduled__2016-04-07T18:00:00
false			2016-05-09 12:51:59.713718
> task_c	dag_b	2016-04-07 16:00:00.000000	2016-05-09 12:53:49.822634	2016-05-09 12:54:49.924291
60.101657	up_for_retry	1	0a8613c2b5d2	root	46359		default	2	ExternalTaskSensor		46359		success
LocalTaskJob	2016-05-09 12:53:44.739355	2016-05-09 12:54:54.810579	2016-05-09 12:53:44.739575
LocalExecutor	0a8613c2b5d2	root	17831	dag_b	2016-04-07 16:00:00.000000	running	scheduled__2016-04-07T16:00:00
false			2016-05-09 12:51:55.078050
> {code}
> I'm getting the following exception which seems to be halting the scheduler just before
it could queue the tasks for retrying:
> {code}
> [2016-05-10 09:42:33,562] {jobs.py:706} ERROR - Instance <DagRun at 0x7f48a6b87550>
is not bound to a Session; attribute refresh operation cannot proceed
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.5/site-packages/airflow/jobs.py", line 703, in _do_dags
>     self.process_dag(dag, tis_out)
>   File "/usr/local/lib/python3.5/site-packages/airflow/jobs.py", line 507, in process_dag
>     active_runs = dag.get_active_runs()
>   File "/usr/local/lib/python3.5/site-packages/airflow/models.py", line 2731, in get_active_runs
>     active_dates.append(run.execution_date)
>   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/attributes.py", line 237,
in __get__
>     return self.impl.get(instance_state(instance), dict_)
>   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/attributes.py", line 578,
in get
>     value = state._load_expired(state, passive)
>   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line 474, in
_load_expired
>     self.manager.deferred_scalar_loader(self, toload)
>   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/loading.py", line 610,
in load_scalar_attributes
>     (state_str(state)))
> sqlalchemy.orm.exc.DetachedInstanceError: Instance <DagRun at 0x7f48a6b87550> is
not bound to a Session; attribute refresh operation cannot proceed
> {code}
> I've managed to fix this by removing all {{ti.are_dependencies_met()}} calls which have
a commit at the end; after doing this there's no exceptions and the tasks are getting retried
correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message