mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marco de Abreu <marco.g.ab...@googlemail.com>
Subject Re: CI: nvml: Driver/library version mismatch
Date Wed, 10 Jan 2018 20:37:44 GMT
Small update to give you some background: We have been able to get the CI
back to a stable state - thanks to Pedro and Kellen! Reason for this issue
was a required security update related to the Spectre-vulnerability
https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-384/+bug/1741807.
This update was not compatible to the installed nvidia-docker version and
thus broke our CI. I have installed all updates, validated that
nvidia-docker is working again and started a new set of
mxnet-linux-gpu-slaves. If any issues arise, please don't hesitate to drop
a quick message on this thread.

-Marco

On Wed, Jan 10, 2018 at 6:45 PM, Marco de Abreu <
marco.g.abreu@googlemail.com> wrote:

> Hello,
>
> recently, Nvidia released a new version of their cuda and gpu drivers for
> Ubuntu16.04. This updated has been applied automatically while the slaves
> were running, which caused the nvidia-docker-daemon to disconnect. Due to
> the update requiring a restart, the daemon was not able to reconnect and
> caused the error 'nvml: Driver/library version mismatch'. We have restarted
> all slaves to apply the update.
>
> In future, we plan to explicitly disallow automated updates of all
> nvidia-related drivers.
>
> Best regards,
> Marco
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message