mxnet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] FCInter opened a new issue #10625: ImageRecordIter loading small data reports out of memory
Date Fri, 20 Apr 2018 08:09:21 GMT
FCInter opened a new issue #10625: ImageRecordIter loading small data reports out of memory
URL: https://github.com/apache/incubator-mxnet/issues/10625
 
 
   ## Description
   (Brief description of the problem in no more than 2 sentences.)
   mx.io.ImageRecordIter reports CUDA out of memory when loading a 50MB .rec file.
   
   
   ## Environment info (Required)
   
   ----------Python Info----------
   ('Version      :', '2.7.14')
   ('Compiler     :', 'GCC 7.2.0')
   ('Build        :', ('default', 'Oct 16 2017 17:29:19'))
   ('Arch         :', ('64bit', ''))
   ------------Pip Info-----------
   ('Version      :', '9.0.1')
   ('Directory    :', '/home/users/mypath/anaconda2/lib/python2.7/site-packages/pip')
   ----------MXNet Info-----------
   /home/users/mypath/anaconda2/lib/python2.7/site-packages/urllib3/contrib/pyopenssl.py:46:
DeprecationWarning: OpenSSL.rand is deprecated - you should use os.urandom instead
     import OpenSSL.SSL
   ('Version      :', '1.0.0')
   ('Directory    :', '/home/users/mypath/anaconda2/lib/python2.7/site-packages/mxnet')
   ('Commit Hash   :', '25720d0e3c29232a37e2650f3ba3a2454f9367bb')
   ----------System Info----------
   ('Platform     :', 'Linux-3.10.0-693.2.2.el7.x86_64-x86_64-with-centos-7.4.1708-Core')
   ('system       :', 'Linux')
   ('node         :', 'somename')
   ('release      :', '3.10.0-693.2.2.el7.x86_64')
   ('version      :', '#1 SMP Tue Sep 12 22:26:13 UTC 2017')
   ----------Hardware Info----------
   ('machine      :', 'x86_64')
   ('processor    :', 'x86_64')
   Architecture:          x86_64
   CPU op-mode(s):        32-bit, 64-bit
   Byte Order:            Little Endian
   CPU(s):                16
   On-line CPU(s) list:   0-15
   Thread(s) per core:    1
   Core(s) per socket:    8
   Socket(s):             2
   NUMA node(s):          2
   Vendor ID:             GenuineIntel
   CPU family:            6
   Model:                 79
   Model name:            Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
   Stepping:              1
   CPU MHz:               1499.121
   CPU max MHz:           3000.0000
   CPU min MHz:           1200.0000
   BogoMIPS:              4199.54
   Virtualization:        VT-x
   L1d cache:             32K
   L1i cache:             32K
   L2 cache:              256K
   L3 cache:              20480K
   NUMA node0 CPU(s):     0-7
   NUMA node1 CPU(s):     8-15
   Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc
arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq
dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3
cdp_l3 intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep
bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total
cqm_mbm_local dtherm ida arat pln pts
   ----------Network Test----------
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0014 sec, LOAD: 1.4199
sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0005 sec, LOAD: 2.7118 sec.
   Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz,
DNS: 0.0005 sec, LOAD: 1.3227 sec.
   Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0005 sec, LOAD: 0.7410 sec.
   Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0007 sec, LOAD: 0.5419 sec.
   Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0926 sec, LOAD: 0.8068 sec.
   
   
   Package used (Python/R/Scala/Julia):
   Python 2.7.14 |Anaconda, Inc.| (default, Oct 16 2017, 17:29:19)
   
   Compiler (gcc/clang/mingw/visual studio):
   gcc 7.2.0
   
   Build config:
   (Paste the content of config.mk, or the build command.)
   
   ## Error Message:
   The error message:
   [16:00:08] src/io/iter_image_recordio_2.cc:170: ImageRecordIOParser2: /home/users/mypath/work/image-classification/data/test_raw_3c.rec,
use 4 threads for decoding..
   [16:00:08] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308:
[16:00:08] src/storage/./pinned_memory_storage.h:59: Check failed: e == cudaSuccess || e ==
cudaErrorCudartUnloading CUDA: out of memory
   
   Stack trace returned 10 entries:
   [bt] (0) /home/users/mypath/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x28965c)
[0x7f0cc9dab65c]
   [bt] (1) /home/users/mypath/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x29cc4e9)
[0x7f0ccc4ee4e9]
   [bt] (2) /home/users/mypath/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x29cc52d)
[0x7f0ccc4ee52d]
   [bt] (3) /home/users/mypath/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x29ca628)
[0x7f0ccc4ec628]
   [bt] (4) /home/users/mypath/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x23ea7a1)
[0x7f0ccbf0c7a1]
   [bt] (5) /home/users/mypath/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x253ee36)
[0x7f0ccc060e36]
   [bt] (6) /home/users/mypath/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x250ae44)
[0x7f0ccc02ce44]
   [bt] (7) /home/users/mypath/anaconda2/bin/../lib/libstdc++.so.6(+0xafc5c) [0x7f0cb4196c5c]
   [bt] (8) /lib64/libpthread.so.0(+0x7e25) [0x7f0cec691e25]
   [bt] (9) /lib64/libc.so.6(clone+0x6d) [0x7f0cebcb634d]
   
   terminate called after throwing an instance of 'dmlc::Error'
     what():  [16:00:08] src/storage/./pinned_memory_storage.h:59: Check failed: e == cudaSuccess
|| e == cudaErrorCudartUnloading CUDA: out of memory
   
   Stack trace returned 10 entries:
   [bt] (0) /home/users/mypath/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x28965c)
[0x7f0cc9dab65c]
   [bt] (1) /home/users/mypath/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x29cc4e9)
[0x7f0ccc4ee4e9]
   [bt] (2) /home/users/mypath/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x29cc52d)
[0x7f0ccc4ee52d]
   [bt] (3) /home/users/mypath/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x29ca628)
[0x7f0ccc4ec628]
   [bt] (4) /home/users/mypath/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x23ea7a1)
[0x7f0ccbf0c7a1]
   [bt] (5) /home/users/mypath/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x253ee36)
[0x7f0ccc060e36]
   [bt] (6) /home/users/mypath/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x250ae44)
[0x7f0ccc02ce44]
   [bt] (7) /home/users/mypath/anaconda2/bin/../lib/libstdc++.so.6(+0xafc5c) [0x7f0cb4196c5c]
   [bt] (8) /lib64/libpthread.so.0(+0x7e25) [0x7f0cec691e25]
   [bt] (9) /lib64/libc.so.6(clone+0x6d) [0x7f0cebcb634d]
   
   Aborted (core dumped)
   
   
   The GPU information:
   +-----------------------------------------------------------------------------+
   | NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
   |-------------------------------+----------------------+----------------------+
   | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
   |===============================+======================+======================|
   |   0  TITAN X (Pascal)    Off  | 00000000:02:00.0 Off |                  N/A |
   | 58%   85C    P2   238W / 250W |  12097MiB / 12189MiB |     76%      Default |
   +-------------------------------+----------------------+----------------------+
   |   1  TITAN X (Pascal)    Off  | 00000000:03:00.0 Off |                  N/A |
   | 37%   62C    P2    59W / 250W |   4220MiB / 12189MiB |      0%      Default |
   +-------------------------------+----------------------+----------------------+
   |   2  TITAN X (Pascal)    Off  | 00000000:82:00.0 Off |                  N/A |
   | 35%   59C    P2    59W / 250W |   4220MiB / 12189MiB |      0%      Default |
   +-------------------------------+----------------------+----------------------+
   |   3  TITAN X (Pascal)    Off  | 00000000:83:00.0 Off |                  N/A |
   | 33%   57C    P2    57W / 250W |   4220MiB / 12189MiB |      0%      Default |
   +-------------------------------+----------------------+----------------------+
   
   
   
   ## Minimum reproducible example
   (If you are using your own code, please provide a short script that reproduces the error.
Otherwise, please provide link to the existing example.)
   import mxnet as mx
   data_train = 'some_path/somefile.rec'
   train_iter = mx.io.ImageRecordIter(
   		path_imgrec=data_train,
   		data_shape=(3,128,128),
   		batch_size=1)
   
   ## Steps to reproduce
   Just execute the above-mentioned code.
   
   ## What have you tried to solve it?
   
   1. I have reduced the data size to 50MB, which is very small. But I still got the out of
memory errors.
   2. I read the API document of ImageRecordIter. I found that there was no argument setting
on which GPU the iterator is set.
   3. Why does ImageRecordIter use GPU? I used to think it will not use GPU until a model
is defined and the ImageRecordIter is bound to that model. If I just load data using ImageRecordIter
without binding, or even without defining any model, I think ImageRecordIter  should not use
GPU.
   4. Moreover, ImageRecordIter seems to use gpu0 by default and there is no way to change
the gpu cards it used. My server has 4 gpu cards, only one busy.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message