mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marco de Abreu <marco.g.ab...@googlemail.com>
Subject Re: Problem with Jenkins GPU instances?
Date Thu, 03 May 2018 19:01:14 GMT
Hello,

I'm already investigating the issue and it seems to be related to the
recently introduced KVStore tests. They tend to hang, leading to job be
forcefully terminated by Jenkins. The problem here is that this does not
terminate the underlying Docker containers, leading to unreleased resources.

As an immediate solution, I will restart all slaves to ensure the CI is
running again. After that, I will try to find a solution to detect and
release these containers.

Best regards,
Marco

On Thu, May 3, 2018 at 8:55 PM, Jin, Hao <hjjn@amazon.com> wrote:

> I’ve encountered 2 failed GPU builds due to “initialization error: driver
> error: failed to process request”, the links to the failed builds are:
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
> incubator-mxnet/detail/PR-10645/17/pipeline/674
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
> incubator-mxnet/detail/PR-10533/18/pipeline
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message