mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Klues (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-7730) CUDA not working anymore on 1.3.0
Date Fri, 21 Jul 2017 17:20:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096554#comment-16096554
] 

Kevin Klues commented on MESOS-7730:
------------------------------------

Hmm. I'm not sure what would have changed to cause this error. I'll dig into it soon.

> CUDA not working anymore on 1.3.0
> ---------------------------------
>
>                 Key: MESOS-7730
>                 URL: https://issues.apache.org/jira/browse/MESOS-7730
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 1.3.0
>            Reporter: Adam Cecile
>            Assignee: Kevin Klues
>             Fix For: 1.2.1
>
>
> Hello,
> My docker container using CUDA do not detect it anymore.
> Here the tensorflow output with 1.2.1:
> {noformat}
> I0628 12:39:45.505900 16309 exec.cpp:162] Version: 1.2.1
> I0628 12:39:45.508358 16301 exec.cpp:237] Executor registered on agent 84c99d0b-8551-4f30-a9bc-6c1edbf7c18c-S1
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0
locally
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5
locally
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0
locally
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1
locally
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0
locally
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled
to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled
to use SSE4.1 instructions, but these are available on your machine and could speed up CPU
computations.
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled
to use SSE4.2 instructions, but these are available on your machine and could speed up CPU
computations.
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled
to use AVX instructions, but these are available on your machine and could speed up CPU computations.
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled
to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled
to use FMA instructions, but these are available on your machine and could speed up CPU computations.
> I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:

> name: GeForce GTX 1080
> major: 6 minor: 1 memoryClockRate (GHz) 1.7335
> pciBusID 0000:82:00.0
> Total memory: 7.92GiB
> Free memory: 7.81GiB
> I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
> I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
> I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0)
-> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:82:00.0)
> {noformat}
> And with 1.3.0
> {noformat}
> I0628 12:40:30.833947 16854 exec.cpp:162] Version: 1.3.0
> I0628 12:40:30.836612 16845 exec.cpp:237] Executor registered on agent 84c99d0b-8551-4f30-a9bc-6c1edbf7c18c-S1
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0
locally
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5
locally
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0
locally
> I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcuda.so.1.
LD_LIBRARY_PATH: 
> I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: zelda.service.earthlab.lu
> I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is:
Not found: was unable to find libcuda.so DSO loaded into this program
> I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:363] driver version file contents:
"""NVRM version: NVIDIA UNIX x86_64 Kernel Module  375.66  Mon May  1 15:29:16 PDT 2017
> GCC version:  gcc version 4.9.2 (Debian 4.9.2-10) 
> """
> I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is:
375.66.0
> I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1065] LD_LIBRARY_PATH: 
> I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1066] failed to find libcuda.so
on this system: Failed precondition: could not dlopen DSO: libcuda.so.1; dlerror: libcuda.so.1:
cannot open shared object file: No such file or directory
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0
locally
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled
to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled
to use SSE4.1 instructions, but these are available on your machine and could speed up CPU
computations.
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled
to use SSE4.2 instructions, but these are available on your machine and could speed up CPU
computations.
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled
to use AVX instructions, but these are available on your machine and could speed up CPU computations.
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled
to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled
to use FMA instructions, but these are available on your machine and could speed up CPU computations.
> E tensorflow/stream_executor/cuda/cuda_driver.cc:509] failed call to cuInit: CUDA_ERROR_NO_DEVICE
> I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic
information for host: zelda.service.earthlab.lu
> I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: zelda.service.earthlab.lu
> I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is:
Not found: was unable to find libcuda.so DSO loaded into this program
> I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:363] driver version file contents:
"""NVRM version: NVIDIA UNIX x86_64 Kernel Module  375.66  Mon May  1 15:29:16 PDT 2017
> GCC version:  gcc version 4.9.2 (Debian 4.9.2-10) 
> """
> I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is:
375.66.0
> {noformat}
> All i did is upgrading/downgrading mesos package and restarted the container. I did the
test several time and it's 100% reproductible.
> Regards, Adam.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message