aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Farner (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AURORA-1401) DB deadlock caused stuck job update
Date Thu, 16 Jul 2015 18:04:04 GMT

     [ https://issues.apache.org/jira/browse/AURORA-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bill Farner updated AURORA-1401:
--------------------------------
    Story Points: 2

> DB deadlock caused stuck job update
> -----------------------------------
>
>                 Key: AURORA-1401
>                 URL: https://issues.apache.org/jira/browse/AURORA-1401
>             Project: Aurora
>          Issue Type: Bug
>          Components: Scheduler
>            Reporter: Bill Farner
>            Assignee: Bill Farner
>
> We observed the symptom of a job update that was stuck in ROLLING_FORWARD, but was not
active.  Looking in scheduler logs revealed a DB-level deadlock:
> {noformat}
> E0715 19:04:53.085 THREAD1471 org.apache.aurora.scheduler.updater.JobUpdateEventSubscriber.startUp:
Failed to resume job updates: org.apache.aurora.scheduler.storage.Storage$StorageException:

> ### Error querying database.  Cause: org.h2.jdbc.JdbcSQLException: Deadlock detected.
The current transaction was rolled back. Details: "
> Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS
(shared) while locking PUBLIC.JOB_KEYS (exclusive).
> Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS
(shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS
(exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement:
> SELECT
>       lock_token
>     FROM job_update_locks AS l
>     INNER JOIN job_updates u ON l.update_row_id = u.id
>      
>     INNER JOIN job_keys AS j ON j.id = u.job_key_id
>    
>     WHERE  
>     u.update_id = ?
>     AND j.role = ?
>     AND j.environment = ?
>     AND j.name = ? [40001-187]
> ### The error may exist in org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
> ### The error may involve defaultParameterMap
> ### The error occurred while setting parameters
> ### SQL: SELECT       lock_token     FROM job_update_locks AS l     INNER JOIN job_updates
u ON l.update_row_id = u.id           INNER JOIN job_keys AS j ON j.id = u.job_key_id    
    WHERE       u.update_id = ?     AND j.role = ?     AND j.environment = ?     AND j.name
= ?
> ### Cause: org.h2.jdbc.JdbcSQLException: Deadlock detected. The current transaction was
rolled back. Details: "
> Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS
(shared) while locking PUBLIC.JOB_KEYS (exclusive).
> Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS
(shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS
(exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement:
> SELECT
>       lock_token
>     FROM job_update_locks AS l
>     INNER JOIN job_updates u ON l.update_row_id = u.id
>      
>     INNER JOIN job_keys AS j ON j.id = u.job_key_id
>    
>     WHERE  
>     u.update_id = ?
>     AND j.role = ?
>     AND j.environment = ?
>     AND j.name = ? [40001-187]
> org.apache.aurora.scheduler.storage.Storage$StorageException: 
> ### Error querying database.  Cause: org.h2.jdbc.JdbcSQLException: Deadlock detected.
The current transaction was rolled back. Details: "
> Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS
(shared) while locking PUBLIC.JOB_KEYS (exclusive).
> Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS
(shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS
(exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement:
> SELECT
>       lock_token
>     FROM job_update_locks AS l
>     INNER JOIN job_updates u ON l.update_row_id = u.id
>      
>     INNER JOIN job_keys AS j ON j.id = u.job_key_id
>    
>     WHERE  
>     u.update_id = ?
>     AND j.role = ?
>     AND j.environment = ?
>     AND j.name = ? [40001-187]
> ### The error may exist in org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
> ### The error may involve defaultParameterMap
> ### The error occurred while setting parameters
> ### SQL: SELECT       lock_token     FROM job_update_locks AS l     INNER JOIN job_updates
u ON l.update_row_id = u.id           INNER JOIN job_keys AS j ON j.id = u.job_key_id    
    WHERE       u.update_id = ?     AND j.role = ?     AND j.environment = ?     AND j.name
= ?
> ### Cause: org.h2.jdbc.JdbcSQLException: Deadlock detected. The current transaction was
rolled back. Details: "
> Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS
(shared) while locking PUBLIC.JOB_KEYS (exclusive).
> Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS
(shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS
(exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement:
> SELECT
>       lock_token
>     FROM job_update_locks AS l
>     INNER JOIN job_updates u ON l.update_row_id = u.id
>      
>     INNER JOIN job_keys AS j ON j.id = u.job_key_id
>    
>     WHERE  
>     u.update_id = ?
>     AND j.role = ?
>     AND j.environment = ?
>     AND j.name = ? [40001-187]
>         at org.apache.aurora.scheduler.storage.db.DbStorage.write(DbStorage.java:145)
>         at org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101)
>         at com.twitter.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:87)
>         at org.apache.aurora.scheduler.storage.log.LogStorage.doInTransaction(LogStorage.java:602)
>         at org.apache.aurora.scheduler.storage.log.LogStorage.write(LogStorage.java:635)
>         at org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.write(CallOrderEnforcingStorage.java:130)
>         at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.systemResume(JobUpdateControllerImpl.java:263)
>         at org.apache.aurora.scheduler.updater.JobUpdateEventSubscriber.startUp(JobUpdateEventSubscriber.java:85)
>         at com.google.common.util.concurrent.AbstractIdleService$2$1.run(AbstractIdleService.java:54)
>         at com.google.common.util.concurrent.Callables$3.run(Callables.java:93)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.ibatis.exceptions.PersistenceException: 
> ### Error querying database.  Cause: org.h2.jdbc.JdbcSQLException: Deadlock detected.
The current transaction was rolled back. Details: "
> Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS
(shared) while locking PUBLIC.JOB_KEYS (exclusive).
> Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS
(shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS
(exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement:
> SELECT
>       lock_token
>     FROM job_update_locks AS l
>     INNER JOIN job_updates u ON l.update_row_id = u.id
>      
>     INNER JOIN job_keys AS j ON j.id = u.job_key_id
>    
>     WHERE  
>     u.update_id = ?
>     AND j.role = ?
>     AND j.environment = ?
>     AND j.name = ? [40001-187]
> ### The error may exist in org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
> ### The error may involve defaultParameterMap
> ### The error occurred while setting parameters
> ### SQL: SELECT       lock_token     FROM job_update_locks AS l     INNER JOIN job_updates
u ON l.update_row_id = u.id           INNER JOIN job_keys AS j ON j.id = u.job_key_id    
    WHERE       u.update_id = ?     AND j.role = ?     AND j.environment = ?     AND j.name
= ?
> ### Cause: org.h2.jdbc.JdbcSQLException: Deadlock detected. The current transaction was
rolled back. Details: "
> Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS
(shared) while locking PUBLIC.JOB_KEYS (exclusive).
> Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS
(shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS
(exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement:
> SELECT
>       lock_token
>     FROM job_update_locks AS l
>     INNER JOIN job_updates u ON l.update_row_id = u.id
>      
>     INNER JOIN job_keys AS j ON j.id = u.job_key_id
>    
>     WHERE  
>     u.update_id = ?
>     AND j.role = ?
>     AND j.environment = ?
>     AND j.name = ? [40001-187]
>         at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30)
>         at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:122)
>         at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:113)
>         at org.apache.ibatis.session.defaults.DefaultSqlSession.selectOne(DefaultSqlSession.java:73)
>         at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:497)
>         at org.apache.ibatis.session.SqlSessionManager$SqlSessionInterceptor.invoke(SqlSessionManager.java:334)
>         at com.sun.proxy.$Proxy57.selectOne(Unknown Source)
>         at org.apache.ibatis.session.SqlSessionManager.selectOne(SqlSessionManager.java:165)
>         at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:69)
>         at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:53)
>         at org.apache.aurora.scheduler.storage.db.$Proxy64.selectLockToken(Unknown Source)
>         at org.apache.aurora.scheduler.storage.db.DbJobUpdateStore.getLockToken(DbJobUpdateStore.java:257)
>         at com.twitter.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:87)
>         at org.apache.aurora.scheduler.storage.ForwardingStore.getLockToken(ForwardingStore.java:178)
>         at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.changeJobUpdateStatus(JobUpdateControllerImpl.java:454)
>         at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.access$1000(JobUpdateControllerImpl.java:108)
>         at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl$4.execute(JobUpdateControllerImpl.java:283)
>         at org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:131)
>         at org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult$Quiet.apply(Storage.java:148)
>         at org.apache.aurora.scheduler.storage.log.LogStorage$24.apply(LogStorage.java:605)
>         at org.apache.aurora.scheduler.storage.log.LogStorage$24.apply(LogStorage.java:602)
>         at org.apache.aurora.scheduler.storage.db.DbStorage.write(DbStorage.java:143)
>         ... 10 more
> Caused by: org.h2.jdbc.JdbcSQLException: Deadlock detected. The current transaction was
rolled back. Details: "
> Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS
(shared) while locking PUBLIC.JOB_KEYS (exclusive).
> Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS
(shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS
(exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement:
> SELECT
>       lock_token
>     FROM job_update_locks AS l
>     INNER JOIN job_updates u ON l.update_row_id = u.id
>      
>     INNER JOIN job_keys AS j ON j.id = u.job_key_id
>    
>     WHERE  
>     u.update_id = ?
>     AND j.role = ?
>     AND j.environment = ?
>     AND j.name = ? [40001-187]
>         at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
>         at org.h2.message.DbException.get(DbException.java:179)
>         at org.h2.message.DbException.get(DbException.java:155)
>         at org.h2.table.RegularTable.doLock1(RegularTable.java:496)
>         at org.h2.table.RegularTable.lock(RegularTable.java:472)
>         at org.h2.table.TableFilter.lock(TableFilter.java:146)
>         at org.h2.command.dml.Select.queryWithoutCache(Select.java:671)
>         at org.h2.command.dml.Query.query(Query.java:322)
>         at org.h2.command.dml.Query.query(Query.java:290)
>         at org.h2.command.dml.Query.query(Query.java:36)
>         at org.h2.command.CommandContainer.query(CommandContainer.java:90)
>         at org.h2.command.Command.executeQuery(Command.java:197)
>         at org.h2.jdbc.JdbcPreparedStatement.execute(JdbcPreparedStatement.java:192)
>         at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(PreparedStatementHandler.java:62)
>         at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(RoutingStatementHandler.java:78)
>         at org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:62)
>         at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:303)
>         at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:154)
>         at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:102)
>         at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:82)
>         at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:120)
>         ... 32 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message