infra-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "yifan zou (JIRA)" <>
Subject [jira] [Commented] (INFRA-16380) Nodes Failing With "Backing Channel '<slave_name>' is disconnected.
Date Thu, 19 Apr 2018 04:43:00 GMT


yifan zou commented on INFRA-16380:

I am glad to know we have datadog setup, and thanks [~cthistle] for creating the dashboard.
As far as I know, Jenkins is the only program running on beam machines, and the CPU usage
will be close to 100% if jobs running. Besides of CPU and MEM usage which [~jasonkuster] mentioned
in this thread, I wonder is it possible to get any data on Jenkins task level, such as Average
Time Spent In Waiting Queue, Average Execution Time of Java/Python PreCommit (or other popular
jobs)? That would also be helpful to us to decide if it is worth to resize CPU and memory
to gain better performance.

> Nodes Failing With "Backing Channel '<slave_name>' is disconnected.
> -------------------------------------------------------------------
>                 Key: INFRA-16380
>                 URL:
>             Project: Infrastructure
>          Issue Type: Bug
>          Components: Jenkins
>            Reporter: Jason Kuster
>            Priority: Major
> We've seen a couple of Jenkins builds dying with the cited error. They generally look
like the following:
> {code}
> FATAL: command execution failed
> Backing channel 'beam2' is disconnected.
> 	at hudson.remoting.RemoteInvocationHandler.channelOrFail(
> 	at hudson.remoting.RemoteInvocationHandler.invoke(
> 	at com.sun.proxy.$Proxy131.isAlive(Unknown Source)
> 	at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(
> 	at hudson.Launcher$RemoteLauncher$ProcImpl.join(
> 	at hudson.Launcher$ProcStarter.join(
> 	at hudson.plugins.gradle.Gradle.performTask(
> 	at hudson.plugins.gradle.Gradle.perform(
> 	at hudson.tasks.BuildStepMonitor$1.perform(
> 	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(
> 	at hudson.model.Build$
> 	at hudson.model.Build$BuildExecution.doRun(
> 	at hudson.model.AbstractBuild$
> 	at hudson.model.Run.execute(
> 	at
> 	at hudson.model.ResourceController.execute(
> 	at
> Caused by: Unexpected termination of the channel
> 	at hudson.remoting.SynchronousCommandTransport$
> Caused by:
> 	at$PeekInputStream.readFully(
> 	at$BlockDataInputStream.readShort(
> 	at
> 	at<init>(
> 	at hudson.remoting.ObjectInputStreamEx.<init>(
> 	at
> 	at hudson.remoting.SynchronousCommandTransport$
> Build step 'Invoke Gradle script' changed build result to FAILURE
> Build step 'Invoke Gradle script' marked build as failure
> ERROR: Step ‘Publish JUnit test result report’ failed: no workspace for beam_PreCommit_Java_GradleBuild
> ERROR: beam2 is offline; cannot locate JDK 1.8 (latest)
> ERROR: beam2 is offline; cannot locate JDK 1.8 (latest)
> ERROR: beam2 is offline; cannot locate JDK 1.8 (latest)
> ERROR: beam2 is offline; cannot locate JDK 1.8 (latest)
> Setting status of 1073baaaa633dd34ed552812e65108944eb92ac6 to FAILURE with url
and message: 'FAILURE
>  '
> Using context: Jenkins: ./gradlew :javaPreCommit
> ERROR: beam2 is offline; cannot locate JDK 1.8 (latest)
> Finished: FAILURE
> {code}

This message was sent by Atlassian JIRA

View raw message