airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hongchang Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace
Date Mon, 26 Feb 2018 14:08:01 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16376911#comment-16376911
] 

Hongchang Li commented on AIRFLOW-401:
--------------------------------------

same issue, with airflow 1.8.2, python 2.7, with _parallelism_ =X child processes spawned,
and one  got "defunct" status.

Add more logs of the stuck workflow, seems that it was related to DB deadlock. 
{code:java}
[2018-02-26 19:13:53,071] {base_task_runner.py:95} INFO - Subtask: Starting attempt 1 of 2
[2018-02-26 19:13:53,071] {base_task_runner.py:95} INFO - Subtask: --------------------------------------------------------------------------------
[2018-02-26 19:13:53,071] {base_task_runner.py:95} INFO - Subtask: 
[2018-02-26 19:13:53,082] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,081]
{models.py:1358} INFO - Executing <Task(SubDagOperator): HourLoginInfo> on 2018-02-26
18:10:00
[2018-02-26 19:13:53,294] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,293]
{models.py:1128} INFO - Dependencies all met for <TaskInstance: xxxxx.HourLoginInfo.HourLoginInfo-70010000
2018-02-26 18:10:00 [scheduled]>
[2018-02-26 19:13:53,301] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,301]
{base_executor.py:50} INFO - Adding to queue: airflow run xxxxx.HourLoginInfo HourLoginInfo-70010000
2018-02-26T18:10:00 --local --pool cp_backfill -sd DAGS_FOLDER/xxxxx.py
[2018-02-26 19:13:53,320] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,319]
{models.py:1128} INFO - Dependencies all met for <TaskInstance: xxxxx.HourLoginInfo.HourLoginInfo-70030000
2018-02-26 18:10:00 [scheduled]>
[2018-02-26 19:13:53,330] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,330]
{base_executor.py:50} INFO - Adding to queue: airflow run xxxxx.HourLoginInfo HourLoginInfo-70030000
2018-02-26T18:10:00 --local --pool cp_backfill -sd DAGS_FOLDER/xxxxx.py
[2018-02-26 19:13:53,373] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,336]
{models.py:1433} ERROR - (_mysql_exceptions.OperationalError) (1213, 'Deadlock found when
trying to get lock; try restarting transaction') [SQL: u'UPDATE task_instance SET state=%s
WHERE task_instance.task_id = %s AND task_instance.dag_id = %s AND task_instance.execution_date
= %s'] [parameters: (u'queued', 'HourLoginInfo-70030000', 'xxxxx.HourLoginInfo', datetime.datetime(2018,
2, 26, 18, 10))] (Background on this error at: http://sqlalche.me/e/e3q8)
[2018-02-26 19:13:53,374] {base_task_runner.py:95} INFO - Subtask: Traceback (most recent
call last):
[2018-02-26 19:13:53,374] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib/python2.7/site-packages/airflow/models.py",
line 1390, in run
[2018-02-26 19:13:53,374] {base_task_runner.py:95} INFO - Subtask: result = task_copy.execute(context=context)
[2018-02-26 19:13:53,374] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib/python2.7/site-packages/airflow/operators/subdag_operator.py",
line 88, in execute
[2018-02-26 19:13:53,374] {base_task_runner.py:95} INFO - Subtask: executor=self.executor)
[2018-02-26 19:13:53,375] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib/python2.7/site-packages/airflow/models.py",
line 3414, in run
[2018-02-26 19:13:53,375] {base_task_runner.py:95} INFO - Subtask: job.run()
[2018-02-26 19:13:53,375] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib/python2.7/site-packages/airflow/jobs.py",
line 201, in run
[2018-02-26 19:13:53,375] {base_task_runner.py:95} INFO - Subtask: self._execute()
[2018-02-26 19:13:53,376] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib/python2.7/site-packages/airflow/jobs.py",
line 1944, in _execute
[2018-02-26 19:13:53,376] {base_task_runner.py:95} INFO - Subtask: session.commit()
[2018-02-26 19:13:53,376] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py",
line 943, in commit
[2018-02-26 19:13:53,376] {base_task_runner.py:95} INFO - Subtask: self.transaction.commit()
[2018-02-26 19:13:53,376] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py",
line 467, in commit
[2018-02-26 19:13:53,376] {base_task_runner.py:95} INFO - Subtask: self._prepare_impl()
[2018-02-26 19:13:53,377] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py",
line 447, in _prepare_impl
[2018-02-26 19:13:53,377] {base_task_runner.py:95} INFO - Subtask: self.session.flush()
[2018-02-26 19:13:53,377] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py",
line 2243, in flush
[2018-02-26 19:13:53,377] {base_task_runner.py:95} INFO - Subtask: self._flush(objects)
[2018-02-26 19:13:53,377] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py",
line 2369, in _flush
[2018-02-26 19:13:53,378] {base_task_runner.py:95} INFO - Subtask: transaction.rollback(_capture_exception=True)
[2018-02-26 19:13:53,378] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py",
line 66, in __exit__
[2018-02-26 19:13:53,378] {base_task_runner.py:95} INFO - Subtask: compat.reraise(exc_type,
exc_value, exc_tb)
[2018-02-26 19:13:53,378] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py",
line 2333, in _flush
[2018-02-26 19:13:53,378] {base_task_runner.py:95} INFO - Subtask: flush_context.execute()
[2018-02-26 19:13:53,379] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py",
line 391, in execute
[2018-02-26 19:13:53,379] {base_task_runner.py:95} INFO - Subtask: rec.execute(self)
[2018-02-26 19:13:53,379] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py",
line 556, in execute
[2018-02-26 19:13:53,379] {base_task_runner.py:95} INFO - Subtask: uow
[2018-02-26 19:13:53,379] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py",
line 177, in save_obj
[2018-02-26 19:13:53,379] {base_task_runner.py:95} INFO - Subtask: mapper, table, update)
[2018-02-26 19:13:53,379] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py",
line 768, in _emit_update_statements
[2018-02-26 19:13:53,380] {base_task_runner.py:95} INFO - Subtask: execute(statement, multiparams)
[2018-02-26 19:13:53,380] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py",
line 948, in execute
[2018-02-26 19:13:53,380] {base_task_runner.py:95} INFO - Subtask: return meth(self, multiparams,
params)
[2018-02-26 19:13:53,380] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py",
line 269, in _execute_on_connection
[2018-02-26 19:13:53,380] {base_task_runner.py:95} INFO - Subtask: return connection._execute_clauseelement(self,
multiparams, params)
[2018-02-26 19:13:53,380] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py",
line 1060, in _execute_clauseelement
[2018-02-26 19:13:53,381] {base_task_runner.py:95} INFO - Subtask: compiled_sql, distilled_params
[2018-02-26 19:13:53,381] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py",
line 1200, in _execute_context
[2018-02-26 19:13:53,381] {base_task_runner.py:95} INFO - Subtask: context)
[2018-02-26 19:13:53,381] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py",
line 1413, in _handle_dbapi_exception
[2018-02-26 19:13:53,381] {base_task_runner.py:95} INFO - Subtask: exc_info
[2018-02-26 19:13:53,381] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py",
line 203, in raise_from_cause
[2018-02-26 19:13:53,381] {base_task_runner.py:95} INFO - Subtask: reraise(type(exception),
exception, tb=exc_tb, cause=cause)
[2018-02-26 19:13:53,382] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py",
line 1193, in _execute_context
[2018-02-26 19:13:53,382] {base_task_runner.py:95} INFO - Subtask: context)
[2018-02-26 19:13:53,382] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py",
line 507, in do_execute
[2018-02-26 19:13:53,382] {base_task_runner.py:95} INFO - Subtask: cursor.execute(statement,
parameters)
[2018-02-26 19:13:53,382] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib/python2.7/site-packages/MySQLdb/cursors.py",
line 250, in execute
[2018-02-26 19:13:53,382] {base_task_runner.py:95} INFO - Subtask: self.errorhandler(self,
exc, value)
[2018-02-26 19:13:53,383] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib/python2.7/site-packages/MySQLdb/connections.py",
line 42, in defaulterrorhandler
[2018-02-26 19:13:53,383] {base_task_runner.py:95} INFO - Subtask: raise errorvalue
[2018-02-26 19:13:53,383] {base_task_runner.py:95} INFO - Subtask: OperationalError: (_mysql_exceptions.OperationalError)
(1213, 'Deadlock found when trying to get lock; try restarting transaction') [SQL: u'UPDATE
task_instance SET state=%s WHERE task_instance.task_id = %s AND task_instance.dag_id = %s
AND task_instance.execution_date = %s'] [parameters: (u'queued', 'HourLoginInfo-70030000',
'xxxxx.HourLoginInfo', datetime.datetime(2018, 2, 26, 18, 10))] (Background on this error
at: http://sqlalche.me/e/e3q8)
[2018-02-26 19:13:53,383] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,374]
{models.py:1449} INFO - Marking task as UP_FOR_RETRY
[2018-02-26 19:13:53,383] {base_task_runner.py:95} INFO - Subtask: Traceback (most recent
call last):
[2018-02-26 19:13:53,383] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/bin/airflow",
line 28, in <module>
[2018-02-26 19:13:53,384] {base_task_runner.py:95} INFO - Subtask: args.func(args)
[2018-02-26 19:13:53,384] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib/python2.7/site-packages/airflow/bin/cli.py",
line 422, in run
[2018-02-26 19:13:53,384] {base_task_runner.py:95} INFO - Subtask: pool=args.pool,
[2018-02-26 19:13:53,384] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib/python2.7/site-packages/airflow/utils/db.py",
line 53, in wrapper
[2018-02-26 19:13:53,384] {base_task_runner.py:95} INFO - Subtask: result = func(*args, **kwargs)
[2018-02-26 19:13:53,384] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib/python2.7/site-packages/airflow/models.py",
line 1403, in run
[2018-02-26 19:13:53,385] {base_task_runner.py:95} INFO - Subtask: self.handle_failure(e,
test_mode, context)
[2018-02-26 19:13:53,385] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib/python2.7/site-packages/airflow/models.py",
line 1476, in handle_failure
[2018-02-26 19:13:53,385] {base_task_runner.py:95} INFO - Subtask: session.merge(self)
[2018-02-26 19:13:53,385] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py",
line 1915, in merge
[2018-02-26 19:13:53,385] {base_task_runner.py:95} INFO - Subtask: _resolve_conflict_map=_resolve_conflict_map)
[2018-02-26 19:13:53,386] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py",
line 1969, in _merge
[2018-02-26 19:13:53,386] {base_task_runner.py:95} INFO - Subtask: merged = self.query(mapper.class_).get(key[1])
[2018-02-26 19:13:53,386] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py",
line 871, in get
[2018-02-26 19:13:53,386] {base_task_runner.py:95} INFO - Subtask: ident, loading.load_on_ident)
[2018-02-26 19:13:53,386] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py",
line 905, in _get_impl
[2018-02-26 19:13:53,387] {base_task_runner.py:95} INFO - Subtask: return fallback_fn(self,
key)
[2018-02-26 19:13:53,387] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/loading.py",
line 231, in load_on_ident
[2018-02-26 19:13:53,387] {base_task_runner.py:95} INFO - Subtask: return q.one()
[2018-02-26 19:13:53,387] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py",
line 2837, in one
[2018-02-26 19:13:53,387] {base_task_runner.py:95} INFO - Subtask: ret = self.one_or_none()
[2018-02-26 19:13:53,388] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py",
line 2807, in one_or_none
[2018-02-26 19:13:53,388] {base_task_runner.py:95} INFO - Subtask: ret = list(self)
[2018-02-26 19:13:53,388] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py",
line 2878, in __iter__
[2018-02-26 19:13:53,388] {base_task_runner.py:95} INFO - Subtask: return self._execute_and_instances(context)
[2018-02-26 19:13:53,388] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py",
line 2899, in _execute_and_instances
[2018-02-26 19:13:53,388] {base_task_runner.py:95} INFO - Subtask: close_with_result=True)
[2018-02-26 19:13:53,389] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py",
line 2908, in _get_bind_args
[2018-02-26 19:13:53,389] {base_task_runner.py:95} INFO - Subtask: **kw
[2018-02-26 19:13:53,389] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py",
line 2890, in _connection_from_session
[2018-02-26 19:13:53,389] {base_task_runner.py:95} INFO - Subtask: conn = self.session.connection(**kw)
[2018-02-26 19:13:53,390] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py",
line 1035, in connection
[2018-02-26 19:13:53,390] {base_task_runner.py:95} INFO - Subtask: execution_options=execution_options)
[2018-02-26 19:13:53,390] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py",
line 1040, in _connection_for_bind
[2018-02-26 19:13:53,390] {base_task_runner.py:95} INFO - Subtask: engine, execution_options)
[2018-02-26 19:13:53,390] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py",
line 388, in _connection_for_bind
[2018-02-26 19:13:53,391] {base_task_runner.py:95} INFO - Subtask: self._assert_active()
[2018-02-26 19:13:53,391] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py",
line 276, in _assert_active
[2018-02-26 19:13:53,391] {base_task_runner.py:95} INFO - Subtask: % self._rollback_exception
[2018-02-26 19:13:53,392] {base_task_runner.py:95} INFO - Subtask: sqlalchemy.exc.InvalidRequestError:
This Session's transaction has been rolled back due to a previous exception during flush.
To begin a new transaction with this Session, first issue Session.rollback(). Original exception
was: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found when trying to get lock;
try restarting transaction') [SQL: u'UPDATE task_instance SET state=%s WHERE task_instance.task_id
= %s AND task_instance.dag_id = %s AND task_instance.execution_date = %s'] [parameters: (u'queued',
'HourLoginInfo-70030000', 'xxxxx.HourLoginInfo', datetime.datetime(2018, 2, 26, 18, 10))]
(Background on this error at: http://sqlalche.me/e/e3q8)
[2018-02-26 19:13:53,392] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,392] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,392] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,392] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,392] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,392] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,393] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,393] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,393] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,393] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,393] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,393] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,393] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,394] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,394] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,394] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,394] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,394] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,394] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,392]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,394] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,392]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,395] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,392]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,395] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,392]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,395] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,392]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,395] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,392]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,395] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,392]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,395] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,392]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,395] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,392]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,396] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,396] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,396] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 19:13:53,391]
{models.py:1367} ERROR - Killing subprocess
[2018-02-26 19:13:53,398] {base_task_runner.py:95} INFO - Subtask: Process LocalWorker-4:
[2018-02-26 19:13:53,398] {base_task_runner.py:95} INFO - Subtask: Process LocalWorker-1:
[2018-02-26 19:13:53,398] {base_task_runner.py:95} INFO - Subtask: Traceback (most recent
call last):
[2018-02-26 19:13:53,398] {base_task_runner.py:95} INFO - Subtask: Traceback (most recent
call last):
[2018-02-26 19:13:53,398] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/multiprocessing/process.py",
line 258, in _bootstrap
[2018-02-26 19:13:53,398] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/multiprocessing/process.py",
line 258, in _bootstrap
[2018-02-26 19:13:53,398] {base_task_runner.py:95} INFO - Subtask: self.run()
[2018-02-26 19:13:53,399] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib/python2.7/site-packages/airflow/executors/local_executor.py",
line 39, in run
[2018-02-26 19:13:53,399] {base_task_runner.py:95} INFO - Subtask: self.run()
[2018-02-26 19:13:53,399] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib/python2.7/site-packages/airflow/executors/local_executor.py",
line 39, in run
[2018-02-26 19:13:53,403] {base_task_runner.py:95} INFO - Subtask: key, command = self.task_queue.get()
[2018-02-26 19:13:53,403] {base_task_runner.py:95} INFO - Subtask: key, command = self.task_queue.get()
[2018-02-26 19:13:53,403] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/multiprocessing/queues.py",
line 117, in get
[2018-02-26 19:13:53,404] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib64/python2.7/multiprocessing/queues.py",
line 117, in get
[2018-02-26 19:13:53,407] {base_task_runner.py:95} INFO - Subtask: res = self._recv()
[2018-02-26 19:13:53,407] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib/python2.7/site-packages/airflow/models.py",
line 1369, in signal_handler
[2018-02-26 19:13:53,407] {base_task_runner.py:95} INFO - Subtask: res = self._recv()
[2018-02-26 19:13:53,407] {base_task_runner.py:95} INFO - Subtask: File "/opt/rh/python27/root/usr/lib/python2.7/site-packages/airflow/models.py",
line 1369, in signal_handler
[2018-02-26 19:13:53,408] {base_task_runner.py:95} INFO - Subtask: raise AirflowException("Task
received SIGTERM signal")
[2018-02-26 19:13:53,408] {base_task_runner.py:95} INFO - Subtask: raise AirflowException("Task
received SIGTERM signal")
[2018-02-26 19:13:53,408] {base_task_runner.py:95} INFO - Subtask: AirflowException: Task
received SIGTERM signal
[2018-02-26 19:13:53,408] {base_task_runner.py:95} INFO - Subtask: AirflowException: Task
received SIGTERM signal
{code}
 

 

> scheduler gets stuck without a trace
> ------------------------------------
>
>                 Key: AIRFLOW-401
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-401
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: executor, scheduler
>    Affects Versions: Airflow 1.7.1.3
>            Reporter: Nadeem Ahmed Nazeer
>            Assignee: Bolke de Bruin
>            Priority: Minor
>              Labels: celery, kombu
>         Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, scheduler_stuck_7hours.png
>
>
> The scheduler gets stuck without a trace or error. When this happens, the CPU usage of
scheduler service is at 100%. No jobs get submitted and everything comes to a halt. Looks
it goes into some kind of infinite loop. 
> The only way I could make it run again is by manually restarting the scheduler service.
But again, after running some tasks it gets stuck. I've tried with both Celery and Local executors
but same issue occurs. I am using the -n 3 parameter while starting scheduler. 
> Scheduler configs,
> job_heartbeat_sec = 5
> scheduler_heartbeat_sec = 5
> executor = LocalExecutor
> parallelism = 32
> Please help. I would be happy to provide any other information needed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message