mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Ciyong" <ciyong.c...@intel.com>
Subject RE: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1
Date Wed, 26 Jun 2019 11:17:49 GMT
Hi Pedro,

I'm looking at this case, and using the script of "incubator-mxnet/example/image-classification/train_cifar10.py"
to get
the timing data, but seems there's not much difference between mxnet 1.4.1.rc0 and 1.5.0.rc1
on C5.18xlarge.

Not sure if there's any difference in the python script, can you point me the link to get
your script (cifar10.py)?
Or you can also have a try with MXNet's script (train_cifar10.py) and see the performance.

Here's the command I used to collect the time: 
	python train_cifar10.py --num-epoch=5

1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
	real	9m4.880s
	user	333m13.340s
	sys	14m36.100s

2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
	real	9m2.155s
	user	329m37.092s
	sys	16m8.668s

-Ciyong


-----Original Message-----
From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com] 
Sent: Wednesday, June 26, 2019 6:28 AM
To: dev@mxnet.incubator.apache.org
Cc: dev@mxnet.apache.org
Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Hi these were my build flags and system info:


--- # CMake configuration
USE_CUDA: "OFF" # Build with CUDA support
USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
USE_OPENCV: "ON" # Build with OpenCV support
USE_OPENMP: "ON" # Build with Openmp support
USE_CUDNN: "ON" # Build with cudnn support) # one could set CUDNN_ROOT for search path
USE_SSE: "ON" # Build with x86 SSE instruction support IF NOT ARM
USE_F16C: "ON" # Build with x86 F16C instruction support) # autodetects support if "ON"
USE_LAPACK: "ON" # Build with lapack support
USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL found) IF USE_MKL_IF_AVAILABLE AND
(NOT APPLE)
USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL found) IF USE_MKL_IF_AVAILABLE AND (NOT
APPLE)
USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of operators IF NOT MSVC
USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if found)
USE_JEMALLOC: "ON" # Build with Jemalloc support
USE_PROFILER: "ON" # Build with Profiler support
USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
USE_CPP_PACKAGE: "OFF" # Build C++ Package
USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming conventions.
USE_GPROF: "OFF" # Compile with gprof (profiling) flag
USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the compiler supports it
USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE (VTune)) # one could set VTUNE_ROOT for
search path
ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation support
BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
INSTALL_EXAMPLES: "OFF" # Install the example source files.
USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults.
USE_TENSORRT: "OFF" # Enable infeference optimization with TensorRT.
USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test coverage metric output
CMAKE_BUILD_TYPE: "Release"
CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
CMAKE_C_COMPILER_LAUNCHER: "ccache"
CMAKE_CXX_COMPILER_LAUNCHER: "ccache"

commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD, tag: 1.5.0.rc1,
upstream/v1.5.x)
commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD, tag: 1.4.1.rc0,
upstream/v1.4.x)

curl http://169.254.169.254/latest/meta-data/instance-type
c5d.18xlarge


Version      : 3.6.7
Compiler     : GCC 8.2.0
Build        : ('default', 'Oct 22 2018 11:32:17')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 19.1.1
Directory    : /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.5.0
Directory    : /home/piotr/mxnet_1.5/python/mxnet
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
system       : Linux
node         : ip-172-31-63-171
release      : 4.15.0-1035-aws
version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              72
On-line CPU(s) list: 0-71
Thread(s) per core:  2
Core(s) per socket:  18
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
Stepping:            4
CPU MHz:             1326.446
BogoMIPS:            6000.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            25344K
NUMA node0 CPU(s):   0-17,36-53
NUMA node1 CPU(s):   18-35,54-71
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc
arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor ssse3
fma cx16 pcid
sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor
lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms
invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl
xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke ----------Network Test----------

----------Python Info----------
Version      : 3.6.7
Compiler     : GCC 8.2.0
Build        : ('default', 'Oct 22 2018 11:32:17')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 19.1.1
Directory    : /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.4.1
Directory    : /home/piotr/mxnet_1.4/python/mxnet
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
system       : Linux
node         : ip-172-31-63-171
release      : 4.15.0-1035-aws
version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              72
On-line CPU(s) list: 0-71
Thread(s) per core:  2
Core(s) per socket:  18
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
Stepping:            4
CPU MHz:             1223.344
BogoMIPS:            6000.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            25344K
NUMA node0 CPU(s):   0-17,36-53
NUMA node1 CPU(s):   18-35,54-71
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc
arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor ssse3
fma cx16 pcid
sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor
lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms
invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl
xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke ----------Network Test----------

On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy <pedro.larroy.lists@gmail.com> wrote:
>
> I did a training of cifar10 in CPU and seems there's some regressions 
> in the range of 7% increase of training time against 1.4.1:
>
> (py3_venv) piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> (master)+$ time python cifar10.py --epochs 5
> real    11m30.388s
> user    417m7.766s
> sys     16m57.315s
>
> VS 1.4.1:
> real    10m41.994s
> user    392m40.646s
> sys     12m30.601s
>
>
> On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <royweilai@gmail.com> wrote:
> >
> > Hi Anirudh,
> >
> > Thanks for jumping into this quickly, I followed up on the issue.
> >
> > I was meant for sockeye developer/maintainers to help setup nightly 
> > tests and raise issues early.
> >
> > Thanks!
> >
> > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin 
> > <haibin.lin.aws@gmail.com>
> > wrote:
> >
> > > In GluonNLP we are testing with MXNET nightly build for each PR, 
> > > and we did find some MXNet related issue caught by the CI.
> > > I recommend other toolkits also add integration tests with MXNet nightly.
> > > It helps identify issues early.
> > >
> > > Best,
> > > Haibin
> > >
> > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric <patric.zhao@intel.com> wrote:
> > >
> > > > Thanks to raise the issue and we will take a look ASAP.
> > > >
> > > > The downstream cases is not in the MXNet CI so it's hard to 
> > > > catch the potential bugs or performance degradation for MXNet developers.
> > > >
> > > > In the future, I suggest adding the major downstream test cases, 
> > > > like
> > > from
> > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the nightly test.
> > > > If it's still too heavy,  maybe testing it weekly or monthly :)
> > > >
> > > > Thanks,
> > > >
> > > > --Patric
> > > >
> > > > > -----Original Message-----
> > > > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > > > > Sent: Friday, June 21, 2019 9:31 AM
> > > > > To: dev@mxnet.incubator.apache.org
> > > > > Cc: dev@mxnet.apache.org
> > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version 
> > > > > 1.5.0.rc1
> > > > >
> > > > > Hi Lai,
> > > > >
> > > > > I have opened an issue:
> > > > > https://github.com/apache/incubator-mxnet/issues/15297
> > > > > I came to know about this issue only today and I have not been
> > > monitoring
> > > > > sockeye.
> > > > > I jumped onto this issue to make sure it wasn't caused by the 
> > > > > dlpack
> > > > changes.
> > > > > Also, I don't  think sockeye CI checks against master, it is 
> > > > > using
> > > 1.4.1.
> > > > >
> > > > > Anirudh
> > > > >
> > > > >
> > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei <royweilai@gmail.com>
wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Could you share which test failed and what’s the crash? How

> > > > > > to reproduce it?
> > > > > >
> > > > > > I was able to install sockeye and run all tests passed. 
> > > > > > Using python setup.py test
> > > > > >
> > > > > > I have tested both nightly pip package and 1.5.0.rc1
> > > > > >
> > > > > > It would be great to create an issue with reproducible steps

> > > > > > and move the discussion there.
> > > > > >
> > > > > > Also I see sockeye nightly build[1] has been failing for 
> > > > > > some time,
> > > if
> > > > > > it’s due to MXNet change, please raise this early so we can

> > > > > > track and solve it in time rather than block the release during
vote time.
> > > > > >
> > > > > > [1] https://travis-ci.org/awslabs/sockeye
> > > > > >
> > > > > >
> > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian 
> > > > > > <anirudh2290@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I was able to reproduce a crash with the commit
> > > > > > > 09202f7f261954383aa387144524d38f83f18d06 but not with the

> > > > > > > commit a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > > > > > >
> > > > > > > Anirudh
> > > > > > >
> > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei 
> > > > > > > <royweilai@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > Hi Przemyslaw,
> > > > > > > >
> > > > > > > > Is there an issue with more details to track the problem?
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak

> > > > > > > > <ptrendx@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > -1
> > > > > > > > >
> > > > > > > > > There is a crash in sockeye unit test (python
setup.py 
> > > > > > > > > test) observed starting with nightly 1.5 build
from 
> > > > > > > > > 6/13 and still occuring in
> > > > > > > 1.5rc1. I
> > > > > > > > > don't yet have the exact commit that is responsible

> > > > > > > > > for it, but it is either 
> > > > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack
> > > > > > > > > related) or
> > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 (cached
op
> > > > > optimization).
> > > > > > > > >
> > > > > > > > > On 2019/06/20 06:36:22, Lai Wei <royweilai@gmail.com>
wrote:
> > > > > > > > > > Dear MXNet community,
> > > > > > > > > >
> > > > > > > > > > This is the 3-day vote to release Apache
MXNet 
> > > > > > > > > > (incubating) version
> > > > > > > > > 1.5.0.
> > > > > > > > > > Voting on dev@ will start June 19, 23:59:59(PST)
 
> > > > > > > > > > and close
> > > on
> > > > > > June
> > > > > > > > 22,
> > > > > > > > > > 23:59:59.
> > > > > > > > > >
> > > > > > > > > > 1) Link to release notes:
> > > > > > > > > >
> > > > > > >
> > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+No
> > > te
> > > > > > > s
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2) Link to release candidate:
> > > > > > > > > >
> > > > > > > > > >
> > > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.r
> > > > > > > > > > c1
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 3) Link to source and signatures on apache
dist server:
> > > > > > > > > >
> > > > > > > > > >
> > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.r
> > > > > > > > > > c1/
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Please remember to TEST first before voting
accordingly:
> > > > > > > > > >
> > > > > > > > > > +1 = approve
> > > > > > > > > > +0 = no opinion
> > > > > > > > > > -1 = disapprove (provide reason)
> > > > > > > > > > --
> > > > > > > > > > Best Regards
> > > > > > > > > >
> > > > > > > > > > Lai
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards
> > > > > > > >
> > > > > > > > Lai
> > > > > > > >
> > > > > > >
> > > > > > --
> > > > > > Best Regards
> > > > > >
> > > > > > Lai
> > > > > >
> > > >
> > >
> > --
> > Best Regards
> >
> > Lai
Mime
View raw message