mxnet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] cbalioglu opened a new issue #9744: Performance regression when OM_NUM_THREADS environment variable is not set
Date Thu, 01 Jan 1970 00:00:00 GMT
cbalioglu opened a new issue #9744: Performance regression when OM_NUM_THREADS environment
variable is not set
URL: https://github.com/apache/incubator-mxnet/issues/9744
 
 
   ## Affected Versions
   - 1.0.0
   - 1.1.0.rc0
   - 1.1.0.rc1
   
   ## Build Options
   - USE_OPENMP=ON
   
   ## Environments Tested
   OS: 
   - Amazon Linux 2012.03 (Kernel 4.4.48)
   - Amazon Linux 2017.03 (Kernel 4.4.48)
   
   Hardware:
   - AWS m4.2xlarge
   - AWS c4.2xlarge
   - AWS p3.2xlarge
   
   ## Impact
   The benchmark numbers below are produced by training a symbolic NTM algorithm on a quad
core (8 logical cores) Intel Xeon E5-2676 CPU. However the regression is reproducible by running
any type of symbolic graph on a CPU context.
   
   OMP_NUM_THREADS | 1.0.0 + GNU OpenMP | 1.0.0 + Intel OpenMP | 1.1.0.rc0 + GNU OpenMP
   ----------------------|----------------------|----------------------|--------------------------
   \- | 75s | 116s | 66s
   4 | 65s | 77s | 48s
   
   ## Root Cause
   The `mxnet::engine::OpenMP` class has already an [implementation](https://github.com/apache/incubator-mxnet/blob/3761f2f36e7e335f17e5c1dcbfe215de27f73fd5/src/engine/openmp.cc#L53)
for automatically setting up the optimal number of OpenMP threads in case the `OMP_NUM_THREADS`
environment variable is not explicitly set. However this implementation does not completely
solve the regression problem as the `omp_set_num_threads` function is thread-local.
   
   The setup logic referred above is only [called](https://github.com/apache/incubator-mxnet/blob/3761f2f36e7e335f17e5c1dcbfe215de27f73fd5/src/engine/openmp.cc#L102)
during the static initialization of the `OpenMP` class in the main thread. However the `mxnet::engine::ThreadedEngine`
class utilizes OpenMP not only in the main thread but also in worker threads. As `omp_set_num_threads`
is never called in these threads, they attempt to use all logical cores for parallelized regions,
which causes the regression.
   
   ## Solution
   The `omp_set_num_threads` function has to be explicitly called in all threads in case the
`OMP_NUM_THREADS` environment variable is not set by the user.
   
   ## Actions Required
   As part of this investigation I already implemented an ad-hoc fix and plan to polish and
make it production ready in the upcoming days. At this stage no action is requested from the
main contributors.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message