From dev-return-4918-archive-asf-public=cust-asf.ponee.io@mxnet.incubator.apache.org Thu Nov 22 15:43:49 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 0F012180645 for ; Thu, 22 Nov 2018 15:43:48 +0100 (CET) Received: (qmail 64655 invoked by uid 500); 22 Nov 2018 14:43:48 -0000 Mailing-List: contact dev-help@mxnet.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mxnet.incubator.apache.org Delivered-To: mailing list dev@mxnet.incubator.apache.org Received: (qmail 64637 invoked by uid 99); 22 Nov 2018 14:43:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Nov 2018 14:43:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 0DD4D18CB6C for ; Thu, 22 Nov 2018 14:43:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.591 X-Spam-Level: X-Spam-Status: No, score=0.591 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-1.458, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Bb7BNm0IY8-u for ; Thu, 22 Nov 2018 14:43:45 +0000 (UTC) Received: from mail-it1-f178.google.com (mail-it1-f178.google.com [209.85.166.178]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id DE1405F3D0 for ; Thu, 22 Nov 2018 14:34:26 +0000 (UTC) Received: by mail-it1-f178.google.com with SMTP id h65so14095587ith.3 for ; Thu, 22 Nov 2018 06:34:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=kM34tTK7xqnTGDp2nzprQl+yjXcnpxU9KLavwrTdYNE=; b=Eg4ENAEhdFA5bUxQnBzVk7aETJc+vhevZBS7CB2eAqW3nKWiEx8538WPBL8xlshLKY 0h2afX7koQBvNkX7PPcXuZP6YBGqbImsB13nDqMws7yPEKJPU6SaESfzUdxzNBX9KSh5 7Immi0tQQdvcAjmdeBHpdzoIHlyj/XSElxuLSqiVDK63RoIlh30LodM+yc4jWhpUNmhL nHogKqTE60kR4jR59CDOhOQfIP74eNxT5FSPfUJuz+aXRQaWrE8D2Vvjv7D9ZPrX3HSv Ukna8z3lBtmohmSl830SRvU4qsdJJ46ra614FPs43ZS5OBvg9awW+z0zxYCjFWpX621x BPXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=kM34tTK7xqnTGDp2nzprQl+yjXcnpxU9KLavwrTdYNE=; b=BHziCo++TgZUUEbfoDRgd7JN5cbJiMwPY7Cm80MBWmVFIagbnNdOJ4s/gz7RYdQfv1 Bd8C3fgKaWyrmOy7izq+ayFEb6ICX/xl2qmXIY9hgTJ07WBI/FZobVt46ncHhsesRMz5 saqVhWkTXdqEQriHu5JwgwIDOulx++mGYMHk3XsMF6Nq+m3vPkxyS9mYZMvb7QYjUEAe BxbUv0WqR8XsbDYRuXYc/OWOIoOe5J8R1vODSmUExwWWqlNmFbrqHh6bTsChZGw+wVRL EKf1R1rTgJ9XIjPcbXqWo5ev3ps+3XHPl0x/C0ZD/p7nZjB0w7ArtStFZDC5kf6QsYoC RwDw== X-Gm-Message-State: AA+aEWahmhpTbFnz4tAkQNBGBiUJP/cLMmgxVS/BW7dbJercYthgrYxg 8GJL7mP7IDI/WZQ7DiNSgjfu5o9D/UiSMAT9wX5VCQ== X-Google-Smtp-Source: AFSGD/XVAuN3jih+kJ/le00CALJqEA8SP85MfHXqVyeNaSWNLKm8pahhPfliBOyFcogbOKrjkB+bwyDyF90gilQ6vFA= X-Received: by 2002:a24:7ac8:: with SMTP id a191mr2553610itc.176.1542897265949; Thu, 22 Nov 2018 06:34:25 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Chris Olivier Date: Thu, 22 Nov 2018 06:34:15 -0800 Message-ID: Subject: Re: [Discussion] MXNet CMake build - raise minimal required version To: dev@mxnet.incubator.apache.org Cc: dev@mxnet.apache.org Content-Type: multipart/alternative; boundary="000000000000764753057b41c3cd" --000000000000764753057b41c3cd Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable i have not seen any proof that any crashes are due to llvm openmp usage. On Thu, Nov 22, 2018 at 2:35 AM Anton Chernov wrote: > Dear MXNet community, > > I propose to raise the minimal required cmake version that is needed to > build MXNet to 3.10 which was tagged on March 16 2018 [1]. > > The effort of repairing cmake scripts in general is targeting to deprecat= e > make and maintain only 1 build system. > > *Need* > > The build system is the foundation of every software project. It's qualit= y > is directly impacting the quality of the project. The MXNet build system = is > fragile, partially broken and not maintained. > > Users of MXNet and developers are confused by the fact that 2 build syste= ms > exist at the same time: make and CMake. > > The main functional areas which are impacted by the current state of the > cmake files are: > > *OpenMP* > The current CMake files mix OpenMP libraries from different compliers whi= ch > is undefined behaviour. It leads to indeterministic crashes on some > platforms. Build and deployment are very hard. No evidence exists that > proves that there is any benefit of having llvm OpenMP library as a > submodule in MXNet. > > *BLAS and LAPACK* > Basic math library usage is mixed up. It is hard and confusing to configu= re > and the choosing logic of the most optimal library is not present. MKL an= d > OpenBLAS are intermixed in an unpredictable manner. > > *Profiling* > The profiler is always on even for production release builds, because MXN= et > can not be build without it [2]. > > *CUDA* > CUDA is detected by 3 different files in the current cmake scripts and th= e > choice of those is based on a obscure logic with involves different > versions of cmake and platforms which it's building on > > * CMakeLists.txt > * cmake/FirstClassLangCuda.cmake > * 3rdparty/mshadow/cmake/Cuda.cmake > > > *Confusing and misleading cmake user options* > For example, USE_CUDA / USE_OLDCMAKECUDA. Some of them will do or not do > what they supposed to based on cmake generator version and version of cma= ke > [3]. > There are currently more than 30 build parameters for MXNet none of them > documented. Some of them not even located in the main CMakeLists.txt file= , > for example 'BLAS'. > > > *Issues* > There is a significant amount of github issues related to cmake or build = in > general. New tickets are issued frequently. > > * #8702 (https://github.com/apache/incubator-mxnet/issues/8702) > [DISCUSSION] Should we deprecate Makefile and only use CMake? > * #5079 (https://github.com/apache/incubator-mxnet/issues/5079) trouble= s > building python interface on raspberry pi 3 > * #1722 (https://github.com/apache/incubator-mxnet/issues/1722) problem= : > compile mxnet with hdfs > * #11549 (https://github.com/apache/incubator-mxnet/issues/11549) Pip > package can be much faster (OpenCV version?) > * #11417 (https://github.com/apache/incubator-mxnet/issues/11417) > libomp.so > dependency (need REAL fix) > * #8532 (https://github.com/apache/incubator-mxnet/issues/8532) > mxnet-mkl > (v0.12.0) crash when using (conda-installed) numpy with MKL // (indirectl= y) > * #11131 (https://github.com/apache/incubator-mxnet/issues/11131) > mxnet-cu92 low efficiency // (indirectly) > * #10743 (https://github.com/apache/incubator-mxnet/issues/10743) CUDA > 9.1.xx failed if not set OLDCMAKECUDA on cmake 3.10.3 with unix makefile = or > Ninja generator > * #10742 (https://github.com/apache/incubator-mxnet/issues/10742) typo in > cpp-package/CMakeLists.txt > * #10737 (https://github.com/apache/incubator-mxnet/issues/10737) Cmake i= s > running again when execute make install > * #10543 (https://github.com/apache/incubator-mxnet/issues/10543) Failed > to > build from source when set USE_CPP_PACKAGE =3D 1, fatal error C1083: unab= el > to open file: =E2=80=9Cmxnet-cpp/op.h=E2=80=9D: No such file or directory > * #10217 (https://github.com/apache/incubator-mxnet/issues/10217) Buildin= g > with OpenCV causes link errors > * #10175 (https://github.com/apache/incubator-mxnet/issues/10175) MXNet > MKLDNN build dependency/flow discussion > * #10009 (https://github.com/apache/incubator-mxnet/issues/10009) > [CMAKE][IoT] Remove pthread from android_arm64 build > * #9944 (https://github.com/apache/incubator-mxnet/issues/9944) MXNet > MinGW-w64 build error // (indirectly) > * #9868 (https://github.com/apache/incubator-mxnet/issues/9868) MKL and > CMake > * #9516 (https://github.com/apache/incubator-mxnet/issues/9516) cmake > cuda arch issues > * #9105 (https://github.com/apache/incubator-mxnet/issues/9105) > libmxnet.so load path error > * #9096 (https://github.com/apache/incubator-mxnet/issues/9096) MXNet > built with GPerftools crashes > * #8786 (https://github.com/apache/incubator-mxnet/issues/8786) Link > failure on DEBUG=3D1 (static member symbol not defined) // (indirectly) > * #8729 (https://github.com/apache/incubator-mxnet/issues/8729) Build > amalgamation using a docker // (indirectly) > * #8667 (https://github.com/apache/incubator-mxnet/issues/8667) > Compiler/linker error while trying to build from source on Mac OSX Sierr= a > 10.12.6 > * #8295 (https://github.com/apache/incubator-mxnet/issues/8295) Buildin= g > with cmake - error > * #7852 (https://github.com/apache/incubator-mxnet/issues/7852) Trouble > installing MXNet on Raspberry Pi 3 > * #13303 (https://github.com/apache/incubator-mxnet/issues/13303) > mxnet-cpp > package cross-compilation fails with OSError: "wrong ELF class: ELFCLASS3= 2" > * #13245 (https://github.com/apache/incubator-mxnet/issues/13245) > mxnet::cpp::NDArray::WaitAll() take about 160ms on gtx1080ti // > (indirectly, cmake impact on performance) > * #12849 (https://github.com/apache/incubator-mxnet/issues/12849) > [cmake][cpp-package] Building with cmake does not install the cpp-package > API > * #12568 (https://github.com/apache/incubator-mxnet/issues/12568) > [Scala][macOS] Trying to build from source > * #12134 (https://github.com/apache/incubator-mxnet/issues/12134) why MKL > and MKL-DNN can't be used simultaneously in ChooseBlas.cmake > * #12107 (https://github.com/apache/incubator-mxnet/issues/12107) Faulty > CUDA detection with cmake > * #11769 (https://github.com/apache/incubator-mxnet/issues/11769) > USE_BLAS=3DMKL fails due to mshadow requiring openblas > * #11563 (https://github.com/apache/incubator-mxnet/issues/11563) > Deprecate > USE_PROFILER from make/cmake > * #10856 (https://github.com/apache/incubator-mxnet/issues/10856) Failed > OpenMP assertion when loading MXNet compiled with DEBUG=3D1 > * #10742 (https://github.com/apache/incubator-mxnet/issues/10742) typo in > cpp-package/CMakeLists.txt > > > *Approach* > > We are going to iteratively fix and simplify the cmake build system and > once is possible deprecate and remove the make system. This PR's have bee= n > opened so far: > > > * #11148 (https://github.com/apache/incubator-mxnet/pull/11148) > [MXNET-679] > Refactor handling BLAS libraries with cmake > * #12160 (https://github.com/apache/incubator-mxnet/pull/12160) Remove > conflicting llvm OpenMP from cmake builds > * #10564 (https://github.com/apache/incubator-mxnet/pull/10564) Simplifie= d > CUDA language detection in cmake > * #10530 (https://github.com/apache/incubator-mxnet/pull/10530) Jetson > build with cmake and CUDA > > Unfortunately, none of them with any success. The question of updating th= e > minimal required version was not asked before, so I'm raising it now. > > By upgrading the version we would remove all custom error-prone cmake fil= es > that are related to: CUDA, BLAS and LAPACK. Essentially covering most of > the problems. > > OpenMP and profiling would need to be addressed separately. > > *Benefit* > > Ease of maintaining of MXNet build, clarity for users, quality and > predictability. > > *Alternatives* > > * Leave the situation as is > * Proceed with the make build > > > I would appreciate hearing your thoughts. > > Best > Anton > > [1] https://github.com/Kitware/CMake/releases/tag/v3.10.3 > [2] https://github.com/apache/incubator-mxnet/issues/11563 > [3] > > https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L46-= L57 > --000000000000764753057b41c3cd--