brooklyn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Corbett <sam.corb...@cloudsoftcorp.com>
Subject Re: Sudden termination of server, possible thread interruption
Date Thu, 29 Jan 2015 14:23:21 GMT
I ran clocker.sh, which runs `exec java ${JAVA_OPTS} -cp
"${INITIAL_CLASSPATH}" brooklyn.clocker.Main "$@"`. None of the entities
that were running do anything weird. There was no core dump. Shutdown hooks
were run in both cases.

The processes were run on a VM in Softlayer. I don't have the precise specs
of the machine to hand but it was Debian, had 8Gb of RAM and was running
OpenJDK 7. Brooklyn was - or at least should have been - operating a light
load in both cases. Each had a two-host Clocker application running and had
had a few other applications with two or three entities deployed to Clocker
and then stopped again. I don't have the machine available any more so
can't get to /var/log.

On 29 January 2015 at 13:19, Richard Downer <richard@apache.org> wrote:

> Further questions:
>
> Where were you running this process - on your workstation, or on a
> Linux server elsewhere? What is the spec/OS of the machine running
> Brooklyn? How much "stuff" was going on in Brooklyn? (large number of
> entites or SSH sensors...)
>
> If this was on a Linux server, I'm wondering if the JVM holding
> Brooklyn used up all of the server's memory and the OOM killer was
> invoked. If so you should see a message in the output of the "dmesg"
> command, or in /var/log/syslog or /var/log/messages.
>
> Richard.
>
>
> On 29 January 2015 at 13:11, Aled Sage <aled.sage@gmail.com> wrote:
> > Hi Sam,
> >
> > First quick questions:
> >
> >  * Was brooklyn definitely run with `nohup` or `disown`?
> >  * Are you running with any unusual entities that might inadvertently
> >    have a System.exit or some such?!
> >  * I presume there was no core dump file in the run directory?
> >
> > The InterruptedException suggest this might be a relatively gracefully
> > shutdown. Do you see evidence that the shutdown hook has been called (so
> the
> > management context was shut down cleanly)?
> >
> > Aled
> >
> >
> > On 29/01/2015 11:07, Sam Corbett wrote:
> >>
> >> Hi all,
> >>
> >> I'd like help getting to the bottom of the unexpected termination of a
> >> Brooklyn process. It hit me twice in a row yesterday, once with nothing
> >> weird in the logs and once with a number of stacktraces indicating an
> >> InterruptedException was thrown. Both processes were run on the same
> host
> >> in Softlayer and were running Clocker.
> >>
> >> The first deployment seemed to be working normally. I had deployed a few
> >> applications to my-docker-cloud and stopped them again. A moment later
> and
> >> the process had stopped. The last thing the server did was check the
> >> status
> >> of a Weave container:
> >>
> >> 2015-01-28 07:16:15,002 DEBUG brooklyn.SSH
> >> [brooklyn-execmanager-BL31ZSeZ-2001]: check-running
> >> WeaveContainerImpl{id=r3fpQokV}, on machine
> >> SshMachineLocation[159.8.36.8:159.8.36.8/159.8.36.8:22@DKRM1V05],
> >> completed: return status 0
> >> 2015-01-28 07:16:15,371 DEBUG b.launcher.BrooklynWebServer
> >> [shutdownHookThread]: BrooklynWebServer detected shutdown: stopping
> >> web-console
> >>
> >> There were no interesting exceptions in the debug log.
> >>
> >> In the second case the process stopped as Brooklyn waited for the status
> >> of
> >> a service that did not provision. This time there was a (lot of)
> >> stacktrace
> >> in the logs. Most pertinently perhaps was:
> >>
> >> 2015-01-28 09:07:49,891 DEBUG b.u.task.BasicExecutionManager
> >> [brooklyn-execmanager-YRXvc51z-1676]: Exception running task
> >> Task[post-start:ihocjzth] (rethrowing): java.lang.InterruptedException:
> >> sleep interrupted
> >> brooklyn.util.exceptions.RuntimeInterruptedException:
> >> java.lang.InterruptedException: sleep interrupted
> >> at brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:89)
> >> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at brooklyn.util.time.Time.sleep(Time.java:312)
> >> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at brooklyn.util.time.Time.sleep(Time.java:318)
> >> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at brooklyn.util.repeat.Repeater.runKeepingError(Repeater.java:382)
> >> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at brooklyn.util.repeat.Repeater.run(Repeater.java:305)
> >> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at brooklyn.entity.basic.Entities.waitForServiceUp(Entities.java:1028)
> >> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at
> >>
> >>
> brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:370)
> >> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at
> >>
> >>
> brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:367)
> >> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at
> >>
> >>
> brooklyn.entity.basic.SoftwareProcessDriverLifecycleEffectorTasks.postStartCustom(SoftwareProcessDriverLifecycleEffectorTasks.java:160)
> >> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at
> >>
> >>
> brooklyn.entity.software.MachineLifecycleEffectorTasks$7.run(MachineLifecycleEffectorTasks.java:431)
> >> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >> ~[na:1.7.0_65]
> >> at
> >>
> >>
> brooklyn.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:337)
> >> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at
> >>
> >>
> brooklyn.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469)
> >> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> [na:1.7.0_65]
> >> at
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >> [na:1.7.0_65]
> >> at
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >> [na:1.7.0_65]
> >> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
> >> Caused by: java.lang.InterruptedException: sleep interrupted
> >> at java.lang.Thread.sleep(Native Method) [na:1.7.0_65]
> >> at brooklyn.util.time.Time.sleep(Time.java:310)
> >> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> ... 15 common frames omitted
> >>
> >> I've got full logs for each run of the server but I didn't get the exit
> >> code of either process. I ran a third test on the same host later in the
> >> day and nothing went wrong (in a reasonable timeframe).
> >>
> >> Has anyone experienced this before? Are there any system logs I could
> have
> >> looked to for more information? A brief look at the standard /var/log
> >> files
> >> revealed nothing.
> >>
> >> It was a bit alarming to see that in the first instance the process
> >> stopped
> >> with no indication why.
> >>
> >> Sam
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message