From dev-return-6865-archive-asf-public=cust-asf.ponee.io@mxnet.incubator.apache.org Fri Nov 1 02:50:13 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 0A1EB180629 for ; Fri, 1 Nov 2019 03:50:12 +0100 (CET) Received: (qmail 74235 invoked by uid 500); 1 Nov 2019 02:50:12 -0000 Mailing-List: contact dev-help@mxnet.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mxnet.incubator.apache.org Delivered-To: mailing list dev@mxnet.incubator.apache.org Received: (qmail 74220 invoked by uid 99); 1 Nov 2019 02:50:11 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Nov 2019 02:50:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id CE7E8C1B44 for ; Fri, 1 Nov 2019 02:50:10 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -2.299 X-Spam-Level: X-Spam-Status: No, score=-2.299 tagged_above=-999 required=6.31 tests=[RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id Wogq22PEwDje for ; Fri, 1 Nov 2019 02:50:09 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=134.134.136.126; helo=mga18.intel.com; envelope-from=tao.a.lv@intel.com; receiver= Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 59D62BC886 for ; Fri, 1 Nov 2019 02:50:08 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 31 Oct 2019 19:50:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.68,253,1569308400"; d="scan'208";a="190839018" Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205]) by orsmga007.jf.intel.com with ESMTP; 31 Oct 2019 19:50:02 -0700 Received: from fmsmsx118.amr.corp.intel.com (10.18.116.18) by fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS) id 14.3.439.0; Thu, 31 Oct 2019 19:50:02 -0700 Received: from shsmsx154.ccr.corp.intel.com (10.239.6.54) by fmsmsx118.amr.corp.intel.com (10.18.116.18) with Microsoft SMTP Server (TLS) id 14.3.439.0; Thu, 31 Oct 2019 19:50:01 -0700 Received: from shsmsx104.ccr.corp.intel.com ([169.254.5.127]) by SHSMSX154.ccr.corp.intel.com ([169.254.7.200]) with mapi id 14.03.0439.000; Fri, 1 Nov 2019 10:49:59 +0800 From: "Lv, Tao A" To: "dev@mxnet.incubator.apache.org" Subject: RE: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release Thread-Topic: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release Thread-Index: AdU/DCoNPQnJxVpDSGuzZgqeV3YLLQEsPw7gAGfdoDASwHutUA== Date: Fri, 1 Nov 2019 02:49:58 +0000 Message-ID: <7EE05B097135E4438AAC805EE2C744E903F684D1@SHSMSX104.ccr.corp.intel.com> References: <7EE05B097135E4438AAC805EE2C744E903E7517B@SHSMSX104.ccr.corp.intel.com> <7EE05B097135E4438AAC805EE2C744E903E8E606@SHSMSX104.ccr.corp.intel.com> <7EE05B097135E4438AAC805EE2C744E903E92AFB@SHSMSX104.ccr.corp.intel.com> In-Reply-To: <7EE05B097135E4438AAC805EE2C744E903E92AFB@SHSMSX104.ccr.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.2.0.6 dlp-reaction: no-action x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMDg1YTBlNGUtYWVkMy00MGQzLWIwOGMtYWJmNWU4MjljNGI0IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiNHgwcWs2MUhwSjQxVW1Za1RGMzVtMTJpTFlQR28xaFZ3bUFRRUV1dHlCMUJpaHc5RkcrUGROeU11ZG9zZmFWUyJ9 x-ctpclassification: CTP_NT x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Hi dev, The feature branch mkldnn-v1.0 has been merged to master. Really appreciate= your support for this task. Branch: https://github.com/apache/incubator-mxnet/tree/mkldnn-v1.0 Project: https://github.com/apache/incubator-mxnet/projects/16 PR: https://github.com/apache/incubator-mxnet/pull/16555 If possible, please downstream projects help to verify the latest master br= anch and feel free to report issues if any. Thanks, -tao -----Original Message----- From: Lv, Tao A =20 Sent: Sunday, July 28, 2019 11:55 PM To: dev@mxnet.incubator.apache.org Cc: Zhao, Patric ; Ye, Jason Y Subject: RE: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release Update: I just cut out the feature branch for MKL-DNN 1.0 integration: https://gith= ub.com/apache/incubator-mxnet/tree/mkldnn-v1.0 Thanks, -tao -----Original Message----- From: Lv, Tao A =20 Sent: Friday, July 26, 2019 10:21 PM To: dev@mxnet.incubator.apache.org Cc: Zhao, Patric ; Ye, Jason Y Subject: RE: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release Seems we don't have any objection. I will try to cut the feature branch in = the following days. Thanks, -tao -----Original Message----- From: Lv, Tao A =20 Sent: Saturday, July 20, 2019 11:06 PM To: dev@mxnet.incubator.apache.org Cc: Zhao, Patric ; Ye, Jason Y Subject: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release Hi dev, MKL-DNN just published its first major release this month: https://github.c= om/intel/mkl-dnn/releases/tag/v1.0. Here I would like to start a discussion= about upgrading MKL-DNN integration from the current v0.20 to v1.0. Motivation To improve the general look-n-feel of the library and solve few important d= esign issues, in the coming v1.0 major release, some of the data structures= , primitive APIs and execution model will be changed and the compatibility = to v0.x versions will be broken accordingly. Change details in MKL-DNN v1.0= are mostly covered in RFC for v1.0. The major changes are listed as belo= w: * Support large tensor with int64_t dimension size. * Expose scratchpad to support stateless primitive and better memory= management hence thread safe. * Pass memory and stream to primitive at execution. * Rework MKL-DNN memory descriptor. * Split LSTM/GRU/RNN into different primitives. * Remove MKLML dependency and stop the release of MKLML and iomp pac= kages in MKL-DNN repository. * Support Intel integrated graphics. With these changes, we can resolve or mitigate several existing issues of M= XNet, eg. #15576 for thread safe, #15544 for MKLML/iomp5 license issue, and= the int64 tensor size for MKL-DNN backend. Besides that, all new features = will go to v1.x and will not be back ported to v0.x. MXNet need update the = MKL-DNN dependency to v1.0 to better leverage new features and performance = improvement. Development Basically we will follow the same integration methodology we used for v0.x = integration, including operator implementation, registration, NDArray modif= ication and graph partitioning. For better collaboration among the communit= y, we will have a feature branch for the development and validation of MKL-= DNN 1.0 integration. All the PRs to the feature branch should pass the code= review and CI and finally get committers approval. The development can be = simply divide into 3 parts and all the work will be done before Q3'19 ends.= During the development, feature branch will sync to the master branch peri= odically. * P1: make/cmake build with MKL-DNN v1.0, all FP32 CNN operators int= egration (in src/operator/nn/mkldnn/). We can do FP32 training and inferenc= e for CNN models after P1 is done. * P2: quantization pass, INT8 operators integration (in src/operator= /quantization/mkldnn). We can do INT8 quantization and INT8 inference after= P2 is done. * P3: RNN operators integration. If needed, documents will be revised accordingly during the development. Validation: * Use feature branch for development - all PRs should pass MXNet CI. * Disable MKL-DNN related tests at the beginning of development and = recover them incrementally during the development. * Intel internal validation: mainly focus on performance and converg= ence validation on CPU, with models from MXNet examples, Gluon-CV and Gluon= -NLP. Criteria for development done: * MXNet CI: pass all existing unit tests, nightly tests * Accuracy: Pass training convergence and inference accuracy validat= ion * Performance: Achieve similar FP32/INT8 performance as v0.x integra= tion Upstreaming to master branch: After development is done, we will start to upstream the feature branch to = the master branch. Since we cannot have two MKL-DNN libraries in MXNet simu= ltaneously, the upstream should be done in a single PR. Possibly the PR wil= l be large, so I hope the community can take time to review and comment dur= ing development of the feature branch. We need do our best to make this happen before the 1.6.0 release so we can = address the license issue raised in the 1.5.0 vote. Please let me know what do you think about this plan. If you think somethin= g should be fixed or improved in this integration, also let me know. thanks, -tao (on behalf of the Intel MXNet team)