aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jordan Ly (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-1950) Exceptions in LogStorage.prepare() cause scheduler die and stay in DEAD state
Date Wed, 27 Sep 2017 20:54:00 GMT

    [ https://issues.apache.org/jira/browse/AURORA-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183246#comment-16183246
] 

Jordan Ly commented on AURORA-1950:
-----------------------------------

https://reviews.apache.org/r/62626/

> Exceptions in LogStorage.prepare() cause scheduler die and stay in DEAD state
> -----------------------------------------------------------------------------
>
>                 Key: AURORA-1950
>                 URL: https://issues.apache.org/jira/browse/AURORA-1950
>             Project: Aurora
>          Issue Type: Bug
>          Components: Scheduler
>            Reporter: Jordan Ly
>            Assignee: Jordan Ly
>         Attachments: prepare_failure.patch
>
>
> If the Scheduler fails while starting up its storage (calling storage.prepare() in PREPARING_STORAGE
state in SchedulerLifecycle.java), the Scheduler will transition to dead but incorrectly shutdown.
The non-daemon threads will fail to stop, leaving the scheduler in the DEAD state.
> The logs showing the issue:
> {noformat}
> I0926 01:02:34.118 [HttpServerLauncher STARTING, AbstractConnector] Started ServerConnector@6b15ba8c{HTTP/1.1,[http/1.1]}{0.0.0.0:8081}
> I0926 01:02:34.120 [HttpServerLauncher STARTING, Server] Started @7043ms
> I0926 01:02:34.122 [main, StateMachine] SchedulerLifecycle state machine transition IDLE
-> PREPARING_STORAGE
> E0926 01:02:35.023 [main, SchedulerLifecycle] Caught unchecked exception: java.lang.IllegalStateException:
Failed to open the log, cannot continue java.lang.IllegalStateException: Failed to open the
log, cannot continue
>         at org.apache.aurora.scheduler.storage.log.LogStorage.prepare(LogStorage.java:407)
>         at org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83)
>         at org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.prepare(CallOrderEnforcingStorage.java:86)
>         at org.apache.aurora.scheduler.SchedulerLifecycle$3.accept(SchedulerLifecycle.java:216)
>         at org.apache.aurora.scheduler.SchedulerLifecycle$3.accept(SchedulerLifecycle.java:213)
>         at org.apache.aurora.common.base.Consumers.lambda$filter$1(Consumers.java:74)
>         at org.apache.aurora.scheduler.SchedulerLifecycle.lambda$dieOnError$1(SchedulerLifecycle.java:337)
>         at org.apache.aurora.common.base.Consumers.lambda$filter$1(Consumers.java:74)
>         at java.util.function.Consumer.lambda$andThen$0(Consumer.java:65)
>         at java.util.function.Consumer.lambda$andThen$0(Consumer.java:65)
>         at java.util.function.Consumer.lambda$andThen$0(Consumer.java:65)
>         at java.util.function.Consumer.lambda$andThen$0(Consumer.java:65)
>         at java.util.function.Consumer.lambda$andThen$0(Consumer.java:65)
>         at java.util.function.Consumer.lambda$andThen$0(Consumer.java:65)
>         at java.util.function.Consumer.lambda$andThen$0(Consumer.java:65)
>         at org.apache.aurora.common.util.StateMachine.transition(StateMachine.java:160)
>         at org.apache.aurora.scheduler.SchedulerLifecycle.prepare(SchedulerLifecycle.java:354)
>         at org.apache.aurora.scheduler.app.SchedulerMain.run(SchedulerMain.java:148)
>         at org.apache.aurora.scheduler.app.SchedulerMain.flagConfiguredMain(SchedulerMain.java:219)
>         at org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:243)
> Caused by: java.io.IOException: blah blah blah
>         at org.apache.aurora.scheduler.storage.log.LogStorage.prepare(LogStorage.java:404)
>         ... 19 common frames omitted
> I0926 01:02:35.023 [main, StateMachine] SchedulerLifecycle state machine transition PREPARING_STORAGE
-> DEAD
> I0926 01:02:35.028 [main, StateMachine] storage state machine transition CONSTRUCTED
-> STOPPED
> I0926 01:02:35.028 [main, Lifecycle] Shutting down application
> I0926 01:02:35.028 [main, ShutdownRegistry$ShutdownRegistryImpl] Executing 4 shutdown
commands.
> I0926 01:02:35.047 [main, StateMachine] SchedulerLifecycle state machine transition DEAD
-> DEAD
> I0926 01:02:35.047 [main, SchedulerLifecycle] Shutdown already invoked, ignoring extra
call.
> I0926 01:02:35.053 [Curator-Framework-0, CuratorFrameworkImpl] backgroundOperationsLoop
exiting
> I0926 01:02:35.066 [main-EventThread, ClientCnxn] EventThread shut down for session:
0x15ebba3b77c0010
> I0926 01:02:35.066 [main, ZooKeeper] Session: 0x15ebba3b77c0010 closed
> I0926 01:02:35.067 [main, SchedulerMain] Application run() exited.
> E0926 01:02:35.067 [main, SchedulerMain] Uncaught exception from Thread[main,5,main]:java.lang.IllegalStateException:
Failed to open the log, cannot continue java.lang.IllegalStateException: Failed to open the
log, cannot continue
>         at org.apache.aurora.scheduler.storage.log.LogStorage.prepare(LogStorage.java:407)
>         at org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83)
>         at org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.prepare(CallOrderEnforcingStorage.java:86)
>         at org.apache.aurora.scheduler.SchedulerLifecycle$3.accept(SchedulerLifecycle.java:216)
>         at org.apache.aurora.scheduler.SchedulerLifecycle$3.accept(SchedulerLifecycle.java:213)
>         at org.apache.aurora.common.base.Consumers.lambda$filter$1(Consumers.java:74)
>         at org.apache.aurora.scheduler.SchedulerLifecycle.lambda$dieOnError$1(SchedulerLifecycle.java:337)
>         at org.apache.aurora.common.base.Consumers.lambda$filter$1(Consumers.java:74)
>         at java.util.function.Consumer.lambda$andThen$0(Consumer.java:65)
>         at java.util.function.Consumer.lambda$andThen$0(Consumer.java:65)
>         at java.util.function.Consumer.lambda$andThen$0(Consumer.java:65)
>         at java.util.function.Consumer.lambda$andThen$0(Consumer.java:65)
>         at java.util.function.Consumer.lambda$andThen$0(Consumer.java:65)
>         at java.util.function.Consumer.lambda$andThen$0(Consumer.java:65)
>         at java.util.function.Consumer.lambda$andThen$0(Consumer.java:65)
>         at org.apache.aurora.common.util.StateMachine.transition(StateMachine.java:160)
>         at org.apache.aurora.scheduler.SchedulerLifecycle.prepare(SchedulerLifecycle.java:354)
>         at org.apache.aurora.scheduler.app.SchedulerMain.run(SchedulerMain.java:148)
>         at org.apache.aurora.scheduler.app.SchedulerMain.flagConfiguredMain(SchedulerMain.java:219)
>         at org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:243)
> Caused by: java.io.IOException: blah blah blah
>         at org.apache.aurora.scheduler.storage.log.LogStorage.prepare(LogStorage.java:404)
>         ... 19 common frames omitted
> I0926 01:02:51.943 [qtp2025987098-37, RequestLog] 192.168.33.1 - - [26/Sep/2017:01:02:51
+0000] "GET //192.168.33.7:8081/threads HTTP/1.1" 200 2120
> I0926 01:02:52.925 [qtp2025987098-38, RequestLog] 192.168.33.1 - - [26/Sep/2017:01:02:52
+0000] "GET //192.168.33.7:8081/threads HTTP/1.1" 200 2134
> {noformat}
> You will still be able to access the "DEAD" Scheduler's HTTP interface. Checking the
"/threads" endpoint shows that services that have failed to stop.
> I've attached a diff that can replicate the issue. Just bring up the Vagrant cluster
and run the Scheduler with the patch applied.
> On a unrelated note: CallOrderEnforcingStorage.java does not allow transitions to STOPPED
from any state other than READY. I am not sure if this is intentional, but I've added the
transitions in the patch to isolate the above error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message