brooklyn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Corbett <sam.corb...@cloudsoftcorp.com>
Subject Re: Sudden termination of server, possible thread interruption
Date Thu, 29 Jan 2015 14:53:49 GMT
Argh. That's boring and plausible. I will always `nohup .. &` in future.

On 29 January 2015 at 14:29, Aled Sage <aled.sage@gmail.com> wrote:

> Sam,
>
> Am I reading that correctly: you ran `clocker.sh` without either of nohup
> or disown, and those are not automatically run by clocker.sh?
>
> If that is the case, the most likely explanation is that your terminal
> session terminated so a sighup was sent to all jobs, including your clocker
> process.
>
> Aled
>
> p.s. see the link that Alasdair sent out: http://unix.stackexchange.com/
> questions/3886/difference-between-nohup-disown-and
>
>
>
> On 29/01/2015 14:23, Sam Corbett wrote:
>
>> I ran clocker.sh, which runs `exec java ${JAVA_OPTS} -cp
>> "${INITIAL_CLASSPATH}" brooklyn.clocker.Main "$@"`. None of the entities
>> that were running do anything weird. There was no core dump. Shutdown
>> hooks
>> were run in both cases.
>>
>> The processes were run on a VM in Softlayer. I don't have the precise
>> specs
>> of the machine to hand but it was Debian, had 8Gb of RAM and was running
>> OpenJDK 7. Brooklyn was - or at least should have been - operating a light
>> load in both cases. Each had a two-host Clocker application running and
>> had
>> had a few other applications with two or three entities deployed to
>> Clocker
>> and then stopped again. I don't have the machine available any more so
>> can't get to /var/log.
>>
>> On 29 January 2015 at 13:19, Richard Downer <richard@apache.org> wrote:
>>
>>  Further questions:
>>>
>>> Where were you running this process - on your workstation, or on a
>>> Linux server elsewhere? What is the spec/OS of the machine running
>>> Brooklyn? How much "stuff" was going on in Brooklyn? (large number of
>>> entites or SSH sensors...)
>>>
>>> If this was on a Linux server, I'm wondering if the JVM holding
>>> Brooklyn used up all of the server's memory and the OOM killer was
>>> invoked. If so you should see a message in the output of the "dmesg"
>>> command, or in /var/log/syslog or /var/log/messages.
>>>
>>> Richard.
>>>
>>>
>>> On 29 January 2015 at 13:11, Aled Sage <aled.sage@gmail.com> wrote:
>>>
>>>> Hi Sam,
>>>>
>>>> First quick questions:
>>>>
>>>>   * Was brooklyn definitely run with `nohup` or `disown`?
>>>>   * Are you running with any unusual entities that might inadvertently
>>>>     have a System.exit or some such?!
>>>>   * I presume there was no core dump file in the run directory?
>>>>
>>>> The InterruptedException suggest this might be a relatively gracefully
>>>> shutdown. Do you see evidence that the shutdown hook has been called (so
>>>>
>>> the
>>>
>>>> management context was shut down cleanly)?
>>>>
>>>> Aled
>>>>
>>>>
>>>> On 29/01/2015 11:07, Sam Corbett wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I'd like help getting to the bottom of the unexpected termination of
a
>>>>> Brooklyn process. It hit me twice in a row yesterday, once with nothing
>>>>> weird in the logs and once with a number of stacktraces indicating an
>>>>> InterruptedException was thrown. Both processes were run on the same
>>>>>
>>>> host
>>>
>>>> in Softlayer and were running Clocker.
>>>>>
>>>>> The first deployment seemed to be working normally. I had deployed a
>>>>> few
>>>>> applications to my-docker-cloud and stopped them again. A moment later
>>>>>
>>>> and
>>>
>>>> the process had stopped. The last thing the server did was check the
>>>>> status
>>>>> of a Weave container:
>>>>>
>>>>> 2015-01-28 07:16:15,002 DEBUG brooklyn.SSH
>>>>> [brooklyn-execmanager-BL31ZSeZ-2001]: check-running
>>>>> WeaveContainerImpl{id=r3fpQokV}, on machine
>>>>> SshMachineLocation[159.8.36.8:159.8.36.8/159.8.36.8:22@DKRM1V05],
>>>>> completed: return status 0
>>>>> 2015-01-28 07:16:15,371 DEBUG b.launcher.BrooklynWebServer
>>>>> [shutdownHookThread]: BrooklynWebServer detected shutdown: stopping
>>>>> web-console
>>>>>
>>>>> There were no interesting exceptions in the debug log.
>>>>>
>>>>> In the second case the process stopped as Brooklyn waited for the
>>>>> status
>>>>> of
>>>>> a service that did not provision. This time there was a (lot of)
>>>>> stacktrace
>>>>> in the logs. Most pertinently perhaps was:
>>>>>
>>>>> 2015-01-28 09:07:49,891 DEBUG b.u.task.BasicExecutionManager
>>>>> [brooklyn-execmanager-YRXvc51z-1676]: Exception running task
>>>>> Task[post-start:ihocjzth] (rethrowing): java.lang.
>>>>> InterruptedException:
>>>>> sleep interrupted
>>>>> brooklyn.util.exceptions.RuntimeInterruptedException:
>>>>> java.lang.InterruptedException: sleep interrupted
>>>>> at brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:89)
>>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at brooklyn.util.time.Time.sleep(Time.java:312)
>>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at brooklyn.util.time.Time.sleep(Time.java:318)
>>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at brooklyn.util.repeat.Repeater.runKeepingError(Repeater.java:382)
>>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at brooklyn.util.repeat.Repeater.run(Repeater.java:305)
>>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at brooklyn.entity.basic.Entities.waitForServiceUp(Entities.java:1028)
>>>>> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at
>>>>>
>>>>>
>>>>>  brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(
>>> SoftwareProcessImpl.java:370)
>>>
>>>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at
>>>>>
>>>>>
>>>>>  brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(
>>> SoftwareProcessImpl.java:367)
>>>
>>>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at
>>>>>
>>>>>
>>>>>  brooklyn.entity.basic.SoftwareProcessDriverLifecycle
>>> EffectorTasks.postStartCustom(SoftwareProcessDriverLifecycle
>>> EffectorTasks.java:160)
>>>
>>>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at
>>>>>
>>>>>
>>>>>  brooklyn.entity.software.MachineLifecycleEffectorTasks$7.run(
>>> MachineLifecycleEffectorTasks.java:431)
>>>
>>>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at
>>>>>
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>
>>>> ~[na:1.7.0_65]
>>>>> at
>>>>>
>>>>>
>>>>>  brooklyn.util.task.DynamicSequentialTask$DstJob.
>>> call(DynamicSequentialTask.java:337)
>>>
>>>> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at
>>>>>
>>>>>
>>>>>  brooklyn.util.task.BasicExecutionManager$SubmissionCallable.call(
>>> BasicExecutionManager.java:469)
>>>
>>>> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>
>>>> [na:1.7.0_65]
>>>
>>>> at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ThreadPoolExecutor.runWorker(
>>> ThreadPoolExecutor.java:1145)
>>>
>>>> [na:1.7.0_65]
>>>>> at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>> ThreadPoolExecutor.java:615)
>>>
>>>> [na:1.7.0_65]
>>>>> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
>>>>> Caused by: java.lang.InterruptedException: sleep interrupted
>>>>> at java.lang.Thread.sleep(Native Method) [na:1.7.0_65]
>>>>> at brooklyn.util.time.Time.sleep(Time.java:310)
>>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> ... 15 common frames omitted
>>>>>
>>>>> I've got full logs for each run of the server but I didn't get the exit
>>>>> code of either process. I ran a third test on the same host later in
>>>>> the
>>>>> day and nothing went wrong (in a reasonable timeframe).
>>>>>
>>>>> Has anyone experienced this before? Are there any system logs I could
>>>>>
>>>> have
>>>
>>>> looked to for more information? A brief look at the standard /var/log
>>>>> files
>>>>> revealed nothing.
>>>>>
>>>>> It was a bit alarming to see that in the first instance the process
>>>>> stopped
>>>>> with no indication why.
>>>>>
>>>>> Sam
>>>>>
>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message