mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leonard Lausen <notificati...@github.com>
Subject Re: [apache/incubator-mxnet] [RFC] Use TVMOp with GPU & Build without libcuda.so in CI (#18716)
Date Wed, 15 Jul 2020 16:10:34 GMT
> Violates the effort of removing libcuda.so totally, (would be great if someone can elaborate
the motivation behind it).

Many customers use a single mxnet build that supports gpu features and deploy it to both gpu
and cpu machines. Due to the way how cuda containers are designed, libcuda.so won't be present
on the cpu machines. That's why it's better to dlopen(cuda) only once needed. This not only
affects tvmop but als nvrtc feature in mxnet.

Using the stubs is a workaround for using dlopen, but adds additional requirements for modifying
the LD_LIBRARY_PATH on users cpu machines. That's not always feasible for users and for mxnet
1.6, which introduced nvrtc, users typically just disable the nvrtc feature to be able to
deploy the libmxnet.so to both cpu and gpu machines. 

Why not fix the underlying problem and then enable tvmop feature?

> Also, When setting -DUSE_TVM_OP=OFF the CI checks would be stuck. 

That doesn't make sense as we are running CI successfully with tvm op disabled since a couple
of months? Maybe you ran into some unrelated flakyness and need to retrigger the run? 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18716#issuecomment-658846227
Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message