From dev-return-2823-archive-asf-public=cust-asf.ponee.io@mxnet.incubator.apache.org Tue May 8 17:25:20 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 1D80518063B for ; Tue, 8 May 2018 17:25:18 +0200 (CEST) Received: (qmail 4883 invoked by uid 500); 8 May 2018 15:25:18 -0000 Mailing-List: contact dev-help@mxnet.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mxnet.incubator.apache.org Delivered-To: mailing list dev@mxnet.incubator.apache.org Received: (qmail 4871 invoked by uid 99); 8 May 2018 15:25:17 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 May 2018 15:25:17 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id CD443CAFFC for ; Tue, 8 May 2018 15:25:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.37 X-Spam-Level: ** X-Spam-Status: No, score=2.37 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_NUMSUBJECT=0.5, KAM_SHORT=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=googlemail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id FgXPf1t9ziwH for ; Tue, 8 May 2018 15:25:12 +0000 (UTC) Received: from mail-lf0-f41.google.com (mail-lf0-f41.google.com [209.85.215.41]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 3F0BF5F5FA for ; Tue, 8 May 2018 15:25:12 +0000 (UTC) Received: by mail-lf0-f41.google.com with SMTP id x7-v6so11460529lff.13 for ; Tue, 08 May 2018 08:25:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=YPqEfQDFuLH7p1nqjvk6bVx8cwkcmsDcSTrPEBZk6+U=; b=Da14Z2fI0jhtYnoPJo/5DgoY8ZNfnZqZglAgIzdV1jOY6KvLr9AUc6dA6N/tmWt92s q0gfRDIV+gy6Kve9qJaibVk4gRye9N8knXQJp3K98JDRWIDhfL+r7ak4so60lm97LMxO Sd+LFgvMIoNav2JSC8k/xEZKHcam9Xr2NaxwCRciOGnppJzhVmIhmPKvVavUGM0uZsha 12eBbTKFO8GCDAyaU3fX/aFGVHB+K9oCOkurwbMzJT9uClhhQVoC9X5pC9A18F3eWPqn DS02PKKBUawrvxUoWzaqFepdrc00O7+LwJN7AkNO5yXRqwZAvJiHF27NTXjEzE+InAzT 42UA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=YPqEfQDFuLH7p1nqjvk6bVx8cwkcmsDcSTrPEBZk6+U=; b=Nk/jdZb2yT17AjkAs8KmWNhO58XwyCsXSXVbHBQB9bcvhKE6MjV7fg9clAc6WkHdta do3TWQlpjIj3ej1ah1pRUhP/IkZKLusvP7rjCdqDiE8v+6GNmqmR70w16FqUmWwe8qGz v5cD6oxiW0uy3aF9FbpSoiO+FYcqtLtyUwLLkiaeqJ9UoD4l3kKYMOBGCXQArvP4KI1o /FLjS5CQYOUU32X1KuMhgIRtLXWjcCfl+/ngis0X4ESGMMg45zM30sgwSEEdyogNkgaE pCbjiqhuB6W1xtlUbsvGwwEx3ufvVpjNoRn5XHuYZzE3/+sRMcfC0hYVvNnxFegvaxqZ CUHQ== X-Gm-Message-State: ALKqPwcdTGQNEYdGxQeq5glcEqJrRVNYGwJRQO+zuH6vQwgR8xtGi8n0 hwBDbp5Md/4H+fnUVz79WgBoGXg1+InEj0/iAr5bXw== X-Google-Smtp-Source: AB8JxZq1xq2IiBY8nAeYDYF7cQvVKwuBzoDfIdbJKryTzZkdjHScqrsL9nu/DxG5A43OYSMnXLfyK09SRVPWJbJXTdg= X-Received: by 2002:a2e:9b4a:: with SMTP id o10-v6mr2594014ljj.49.1525793110879; Tue, 08 May 2018 08:25:10 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a19:80c7:0:0:0:0:0 with HTTP; Tue, 8 May 2018 08:24:30 -0700 (PDT) In-Reply-To: References: From: Marco de Abreu Date: Tue, 8 May 2018 17:24:30 +0200 Message-ID: Subject: Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2 To: dev@mxnet.incubator.apache.org Content-Type: multipart/alternative; boundary="0000000000005ff1a9056bb3644c" --0000000000005ff1a9056bb3644c Content-Type: text/plain; charset="UTF-8" Sorry for the vague phrasing, it is back to normal. This can be verified at [1]. I agree with Kellen; we will actively be working with the maintainers of dockcross to ensure their repository is brought back to a stable state which also provides proper tagging. +1 from my side now. [1]: http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/incubator-mxnet/branches/PR-10850/runs/1/nodes/67/steps/329/log/?start=0 On Tue, May 8, 2018 at 4:42 PM, kellen sunderland < kellen.sunderland@gmail.com> wrote: > Thanks Marco for the work-arounds and for getting this fixed in CI. I > personally don't see this as a release blocker as it's targeting a still > experimental feature (Jetson pip wheels). I also have a pretty high level > of confidence that we can fix this by working with the crossdock org. This > would mean in the future this release cut would still work for users who > are interested in building the 1.2 release for their Jetson devices. > > On Tue, May 8, 2018 at 3:46 PM, Steffen Rochel > wrote: > > > Should be back or is back to normal? Would you please verify and update > > your vote on dev@ accordingly? > > Currently you are on record as -1. Just trying to help Anirud to get > proper > > vote count. > > > > Thanks > > Steffen (MXNet contributor hat on) > > > > On Tue, May 8, 2018 at 6:37 AM Marco de Abreu < > > marco.g.abreu@googlemail.com> > > wrote: > > > > > Yes, sorry for the inconvenience! We fixed the root cause and > everything > > > should be back to normal. > > > > > > -Marco > > > > > > Steffen Rochel schrieb am Di., 8. Mai 2018, > > > 14:59: > > > > > > > Marco - thanks for your efforts. Does this unblock the Apache MXNet > > v1.2 > > > > release and change your vote? > > > > > > > > On Tue, May 8, 2018 at 3:00 AM Marco de Abreu < > > > > marco.g.abreu@googlemail.com> > > > > wrote: > > > > > > > > > Small update regarding the ARM64 builds. I have created two pull > > > requests > > > > > [1][2] which changed the repository to a mirror I created. This > > mirror > > > > was > > > > > created using a cached version of the working Docker image, > > effectively > > > > > reverting the state back to a working one. At the same time, this > > pins > > > > the > > > > > container to prevent any further problems. > > > > > > > > > > I would prefer to use the public repository instead of our own > > mirror, > > > > but > > > > > for now, this is inevitable. If anybody would like to be added to > the > > > > > Dockerhub organization "mxnetci", feel free to let me know! To > > prevent > > > > > problems like these in future, I created a feature request at [3] > to > > > > ensure > > > > > future releases of that Dockerimage are properly tagged. > > Additionally, > > > > the > > > > > creator of the failing PR is aware and actively involved in > creating > > a > > > > > permanent solution [4]. > > > > > > > > > > Best regards, > > > > > Marco > > > > > > > > > > [1]: https://github.com/apache/incubator-mxnet/pull/10850 > > > > > [2]: https://github.com/apache/incubator-mxnet/pull/10849 > > > > > [3]: https://github.com/dockcross/dockcross/issues/223 > > > > > [4]: https://github.com/dockcross/dockcross/pull/221 > > > > > > > > > > On Tue, May 8, 2018 at 2:39 AM, Lai Wei > wrote: > > > > > > > > > > > Hi Anirudh, > > > > > > > > > > > > Update: Did an install on a fresh instance with USE_MKLDNN=1, > works > > > > fine > > > > > > now. Pip install with --pre is also working fine. > > > > > > Problem is the mkl-dnn I installed on the old instance. > > > > > > Closing the issue < > > > > > https://github.com/awslabs/keras-apache-mxnet/issues/75 > > > > > > >. > > > > > > > > > > > > Thanks! > > > > > > > > > > > > Best Regards > > > > > > > > > > > > Lai Wei > > > > > > > > > > > > https://www.linkedin.com/pub/lai-wei/2b/731/52b > > > > > > > > > > > > On Mon, May 7, 2018 at 2:48 PM, Lai Wei > > wrote: > > > > > > > > > > > > > Hi Anirudh, > > > > > > > > > > > > > > yes, also tried that, didn't resolve. Looking into root cause > > and > > > > will > > > > > > > update. > > > > > > > > > > > > > > Best Regards > > > > > > > > > > > > > > Lai Wei > > > > > > > > > > > > > > https://www.linkedin.com/pub/lai-wei/2b/731/52b > > > > > > > > > > > > > > On Mon, May 7, 2018 at 2:15 PM, Anirudh > > > > > wrote: > > > > > > > > > > > > > >> Hi Lai, > > > > > > >> > > > > > > >> I see that you used USE_MKL2017_EXPERIMENTAL=1, I am not sure > if > > > > this > > > > > is > > > > > > >> the right flag. Did you try USE_MKLDNN=1 ? > > > > > > >> > > > > > > >> Anirudh > > > > > > >> > > > > > > >> On Mon, May 7, 2018 at 1:22 PM, Lai Wei > > > > wrote: > > > > > > >> > > > > > > >> > Hi, > > > > > > >> > > > > > > > >> > I would like to raise an issue with mxnet-mkl. The > keras-mxnet > > > > > package > > > > > > >> was > > > > > > >> > working fine with mxnet-mkl 1.1.0 for training on CPU. > > However, > > > > > > weights > > > > > > >> are > > > > > > >> > not updated when I use mxnet-mkl 1.2.0b20180507. I tried > both > > > 'pip > > > > > > >> install > > > > > > >> > mxnet-mkl --pre' and built from source from release branch > > > > (v1.2.0) > > > > > > with > > > > > > >> > mkl flag. > > > > > > >> > > > > > > > >> > Please refer to this issue for more details: > > > > > > >> > https://github.com/awslabs/keras-apache-mxnet/issues/75 > > > > > > >> > > > > > > > >> > There is no code change on keras-mxnet side, so I guess some > > API > > > > > broke > > > > > > >> when > > > > > > >> > using latest mxnet-mkl. Still working on finding the root > > cause. > > > > > > >> > > > > > > > >> > Thanks > > > > > > >> > > > > > > > >> > > > > > > > >> > Best Regards > > > > > > >> > > > > > > > >> > Lai Wei > > > > > > >> > > > > > > > >> > https://www.linkedin.com/pub/lai-wei/2b/731/52b > > > > > > >> > > > > > > > >> > On Mon, May 7, 2018 at 10:38 AM, Haibin Lin < > > > > > haibin.lin.aws@gmail.com > > > > > > > > > > > > > >> > wrote: > > > > > > >> > > > > > > > >> > > +1 binding. Build from source with CUDA, ran linear > > > > classification > > > > > > >> > example > > > > > > >> > > and works fine. > > > > > > >> > > > > > > > > >> > > Best. > > > > > > >> > > Haibin > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > On Sun, May 6, 2018 at 10:08 PM, Steffen Rochel < > > > > > > >> steffenrochel@gmail.com > > > > > > >> > > > > > > > > >> > > wrote: > > > > > > >> > > > > > > > > >> > > > +1 (non-binding). Tested with selected notebooks from > The > > > > > Straight > > > > > > >> > Dope. > > > > > > >> > > > So many important enhancements everybody contributed and > > our > > > > > users > > > > > > >> are > > > > > > >> > > > waiting for. Hope we will see more votes. > > > > > > >> > > > Steffen > > > > > > >> > > > On Mon, May 7, 2018 at 1:07 AM Anirudh < > > > anirudh2290@gmail.com > > > > > > > > > > > >> wrote: > > > > > > >> > > > > > > > > > >> > > > > Hi all, > > > > > > >> > > > > > > > > > > >> > > > > Since we don't have enough binding votes yet, I am > > > extending > > > > > the > > > > > > >> vote > > > > > > >> > > > till > > > > > > >> > > > > tomorrow (Monday May 7th), 12:50 PM PDT. > > > > > > >> > > > > > > > > > > >> > > > > Anirudh > > > > > > >> > > > > > > > > > > >> > > > > On Sun, May 6, 2018 at 4:05 PM, Anirudh < > > > > > anirudh2290@gmail.com> > > > > > > >> > wrote: > > > > > > >> > > > > > > > > > > >> > > > > > Hi Pedro, > > > > > > >> > > > > > > > > > > > >> > > > > > Thanks for the clarification. I was able to > reproduce > > > the > > > > > > issue > > > > > > >> > with > > > > > > >> > > > > > USE_OPENMP=OFF. I wasn't able to reproduce the issue > > > with > > > > > > Make. > > > > > > >> > Since > > > > > > >> > > > the > > > > > > >> > > > > > issue is not reproducible with make and the > customers > > > > using > > > > > > >> > > > > USE_OPENMP=OFF > > > > > > >> > > > > > with cmake should be small, I agree with you that > this > > > > > should > > > > > > >> not > > > > > > >> > be > > > > > > >> > > a > > > > > > >> > > > > > blocker. I have added the issue to known issues in > > > release > > > > > > >> notes: > > > > > > >> > > > > > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.2. > > > > > > >> 0.rc2 > > > > > > >> > > > > > > > > > > > >> > > > > > Anirudh > > > > > > >> > > > > > > > > > > > >> > > > > > On Sun, May 6, 2018 at 9:03 AM, Pedro Larroy < > > > > > > >> > > > > pedro.larroy.lists@gmail.com > > > > > > >> > > > > > > wrote: > > > > > > >> > > > > > > > > > > > >> > > > > >> Agreed, I was not aware that the problems where not > > > > present > > > > > > in > > > > > > >> the > > > > > > >> > > > > release > > > > > > >> > > > > >> branch. > > > > > > >> > > > > >> > > > > > > >> > > > > >> On Fri, May 4, 2018 at 8:32 PM, Haibin Lin < > > > > > > >> > > haibin.lin.aws@gmail.com> > > > > > > >> > > > > >> wrote: > > > > > > >> > > > > >> > > > > > > >> > > > > >> > I agree with Anirudh that the focus of the > > discussion > > > > > > should > > > > > > >> be > > > > > > >> > > > > limited > > > > > > >> > > > > >> to > > > > > > >> > > > > >> > the release branch, not the master branch. > Anything > > > > that > > > > > > >> breaks > > > > > > >> > on > > > > > > >> > > > > >> master > > > > > > >> > > > > >> > but works on release branch should not block the > > > > release > > > > > > >> itself. > > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > >> > Best, > > > > > > >> > > > > >> > > > > > > > >> > > > > >> > Haibin > > > > > > >> > > > > >> > > > > > > > >> > > > > >> > On Fri, May 4, 2018 at 10:58 AM, Pedro Larroy < > > > > > > >> > > > > >> > pedro.larroy.lists@gmail.com> > > > > > > >> > > > > >> > wrote: > > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > I see your point. > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > I checked the failures on the v1.2.0 branch > and I > > > > don't > > > > > > see > > > > > > >> > > > > segfaults, > > > > > > >> > > > > >> > just > > > > > > >> > > > > >> > > minor failures due to flaky tests. > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > I will trigger it repeatedly a few times until > > > Sunday > > > > > to > > > > > > >> have > > > > > > >> > a > > > > > > >> > > > and > > > > > > >> > > > > >> > change > > > > > > >> > > > > >> > > my vote accordingly. > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > http://jenkins.mxnet-ci.amazon-ml.com/job/incubator- > > > > > > >> > mxnet/job/v1.2.0/ > > > > > > >> > > > > >> > > http://jenkins.mxnet-ci.amazon-ml.com/blue/ > > > > > > >> > > organizations/jenkins/ > > > > > > >> > > > > >> > > incubator-mxnet/detail/v1.2.0/17/pipeline > > > > > > >> > > > > >> > > http://jenkins.mxnet-ci.amazon-ml.com/blue/ > > > > > > >> > > organizations/jenkins/ > > > > > > >> > > > > >> > > incubator-mxnet/detail/v1.2.0/15/pipeline/ > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > Pedro. > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > On Fri, May 4, 2018 at 7:16 PM, Anirudh < > > > > > > >> > anirudh2290@gmail.com> > > > > > > >> > > > > >> wrote: > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > > Hi Pedro, > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > > > Thank you for the suggestions. I will try to > > > > > reproduce > > > > > > >> this > > > > > > >> > > > > without > > > > > > >> > > > > >> > fixed > > > > > > >> > > > > >> > > > seeds and also run it for a longer time > > duration. > > > > > > >> > > > > >> > > > Having said that, running unit tests over and > > > over > > > > > for > > > > > > a > > > > > > >> > > couple > > > > > > >> > > > of > > > > > > >> > > > > >> days > > > > > > >> > > > > >> > > > will likely cause > > > > > > >> > > > > >> > > > problems because there around 42 open issues > > for > > > > > flaky > > > > > > >> > tests: > > > > > > >> > > > > >> > > > > > > > > https://github.com/apache/incubator-mxnet/issues?q=is% > > > > > > >> > > > > >> > > > 3Aopen+is%3Aissue+label%3AFlaky > > > > > > >> > > > > >> > > > Also, the release branch has diverged from > > master > > > > > > around > > > > > > >> 3 > > > > > > >> > > weeks > > > > > > >> > > > > >> back > > > > > > >> > > > > >> > and > > > > > > >> > > > > >> > > > it doesn't have many of the changes merged to > > the > > > > > > master. > > > > > > >> > > > > >> > > > So, my question essentially is, what will be > > your > > > > > > >> benchmark > > > > > > >> > to > > > > > > >> > > > > >> accept > > > > > > >> > > > > >> > the > > > > > > >> > > > > >> > > > release ? > > > > > > >> > > > > >> > > > Is it that we run the test which you provided > > on > > > > 1.2 > > > > > > >> without > > > > > > >> > > > fixed > > > > > > >> > > > > >> > seeds > > > > > > >> > > > > >> > > > and for a longer duration without failures ? > > > > > > >> > > > > >> > > > Or is it that all unit tests should pass > over a > > > > > period > > > > > > >> of 2 > > > > > > >> > > days > > > > > > >> > > > > >> > without > > > > > > >> > > > > >> > > > issues. This may require fixing all of the > > flaky > > > > > tests > > > > > > >> which > > > > > > >> > > > would > > > > > > >> > > > > >> > delay > > > > > > >> > > > > >> > > > the release by considerable amount of time. > > > > > > >> > > > > >> > > > Or is it something else ? > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > > > Anirudh > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > > > On Fri, May 4, 2018 at 4:49 AM, Pedro Larroy > < > > > > > > >> > > > > >> > > pedro.larroy.lists@gmail.com > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > wrote: > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > > > > Could you remove the fixed seeds and run it > > > for a > > > > > > >> couple > > > > > > >> > of > > > > > > >> > > > > hours > > > > > > >> > > > > >> > with > > > > > > >> > > > > >> > > an > > > > > > >> > > > > >> > > > > additional loop? Also I would suggest > > running > > > > the > > > > > > unit > > > > > > >> > > tests > > > > > > >> > > > > over > > > > > > >> > > > > >> > and > > > > > > >> > > > > >> > > > over > > > > > > >> > > > > >> > > > > for a couple of days if possible. > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > Pedro. > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > On Thu, May 3, 2018 at 8:33 PM, Anirudh < > > > > > > >> > > > anirudh2290@gmail.com> > > > > > > >> > > > > >> > wrote: > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > Hi Pedro and Naveen, > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > I am unable to reproduce this issue with > > > MKLDNN > > > > > on > > > > > > >> the > > > > > > >> > > > master > > > > > > >> > > > > >> but > > > > > > >> > > > > >> > not > > > > > > >> > > > > >> > > > on > > > > > > >> > > > > >> > > > > > the 1.2.RC2 branch. > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > Did the following on 1.2.RC2 branch: > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > make -j $(nproc) USE_OPENCV=1 > > > USE_BLAS=openblas > > > > > > >> > > > > >> USE_DIST_KVSTORE=0 > > > > > > >> > > > > >> > > > > > USE_CUDA=0 USE_CUDNN=0 USE_MKLDNN=1 > > > > > > >> > > > > >> > > > > > export MXNET_STORAGE_FALLBACK_LOG_ > > VERBOSE=0 > > > > > > >> > > > > >> > > > > > export MXNET_TEST_SEED=11 > > > > > > >> > > > > >> > > > > > export MXNET_MODULE_SEED=812478194 > > > > > > >> > > > > >> > > > > > export MXNET_TEST_COUNT=10000 > > > > > > >> > > > > >> > > > > > nosetests-2.7 -v > > tests/python/unittest/test_ > > > > > > >> > > > > >> > > > > module.py:test_forward_reshape > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > Was able to do the 10k runs successfully. > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > Anirudh > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > On Thu, May 3, 2018 at 8:46 AM, Anirudh < > > > > > > >> > > > > anirudh2290@gmail.com> > > > > > > >> > > > > >> > > wrote: > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > > Hi Pedro and Naveen, > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > > Is this issue reproducible when MXNet > is > > > > built > > > > > > with > > > > > > >> > > > > >> USE_MKLDNN=0? > > > > > > >> > > > > >> > > > > > > Also, there are a bunch of MKLDNN fixes > > > that > > > > > > >> didn't go > > > > > > >> > > > into > > > > > > >> > > > > >> the > > > > > > >> > > > > >> > > > release > > > > > > >> > > > > >> > > > > > > branch. Is this issue reproducible on > the > > > > > release > > > > > > >> > > branch ? > > > > > > >> > > > > >> > > > > > > In my opinion, since we have marked > > MKLDNN > > > as > > > > > > >> > > experimental > > > > > > >> > > > > >> > feature > > > > > > >> > > > > >> > > > for > > > > > > >> > > > > >> > > > > > the > > > > > > >> > > > > >> > > > > > > release, if it is confirmed to be a > > MKLDNN > > > > > issue > > > > > > >> > > > > >> > > > > > > we don't need to block the release on > it. > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > > Anirudh > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > > On Thu, May 3, 2018 at 6:58 AM, Naveen > > > Swamy > > > > < > > > > > > >> > > > > >> mnnaveen@gmail.com > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > > > wrote: > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > >> Thanks for raising this issue Pedro. > > > > > > >> > > > > >> > > > > > >> > > > > > > >> > > > > >> > > > > > >> -1(binding) > > > > > > >> > > > > >> > > > > > >> > > > > > > >> > > > > >> > > > > > >> We were in a similar state for a > while a > > > > year > > > > > > >> ago, a > > > > > > >> > > lot > > > > > > >> > > > of > > > > > > >> > > > > >> > effort > > > > > > >> > > > > >> > > > > went > > > > > > >> > > > > >> > > > > > to > > > > > > >> > > > > >> > > > > > >> stabilize the tests and the CI. I have > > > seen > > > > > the > > > > > > PR > > > > > > >> > > builds > > > > > > >> > > > > are > > > > > > >> > > > > >> > > > > > >> non-deterministic and you have to > retry > > > over > > > > > and > > > > > > >> over > > > > > > >> > > > > >> (wasting > > > > > > >> > > > > >> > > > > resources > > > > > > >> > > > > >> > > > > > >> and time) and hope you get lucky. > > > > > > >> > > > > >> > > > > > >> > > > > > > >> > > > > >> > > > > > >> Look at the dashboard for master build > > > > > > >> > > > > >> > > > > > >> http://jenkins.mxnet-ci.amazon > > > > > > >> -ml.com/job/incubator- > > > > > > >> > > > > >> > > > mxnet/job/master/ > > > > > > >> > > > > >> > > > > > >> > > > > > > >> > > > > >> > > > > > >> -Naveen > > > > > > >> > > > > >> > > > > > >> > > > > > > >> > > > > >> > > > > > >> On Thu, May 3, 2018 at 5:11 AM, Pedro > > > > Larroy < > > > > > > >> > > > > >> > > > > > >> pedro.larroy.lists@gmail.com> > > > > > > >> > > > > >> > > > > > >> wrote: > > > > > > >> > > > > >> > > > > > >> > > > > > > >> > > > > >> > > > > > >> > -1 nondeterminisitc failures on CI > > > > master: > > > > > > >> > > > > >> > > > > > >> > https://issues.apache.org/ > > > > > > jira/browse/MXNET-396 > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > >> > Was able to reproduce once in a > fresh > > p3 > > > > > > >> instance > > > > > > >> > > with > > > > > > >> > > > > >> DLAMI > > > > > > >> > > > > >> > > > can't > > > > > > >> > > > > >> > > > > > >> > reproduce consistently. > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > >> > On Wed, May 2, 2018 at 9:51 PM, > > Anirudh > > > < > > > > > > >> > > > > >> > anirudh2290@gmail.com> > > > > > > >> > > > > >> > > > > > wrote: > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > >> > > Hi all, > > > > > > >> > > > > >> > > > > > >> > > > > > > > > >> > > > > >> > > > > > >> > > As part of RC2 release, we have > > > > addressed > > > > > > bugs > > > > > > >> > and > > > > > > >> > > > some > > > > > > >> > > > > >> > > concerns > > > > > > >> > > > > >> > > > > > that > > > > > > >> > > > > >> > > > > > >> > were > > > > > > >> > > > > >> > > > > > >> > > raised. > > > > > > >> > > > > >> > > > > > >> > > > > > > > > >> > > > > >> > > > > > >> > > I would like to propose a vote to > > > > release > > > > > > >> Apache > > > > > > >> > > > MXNet > > > > > > >> > > > > >> > > > > (incubating) > > > > > > >> > > > > >> > > > > > >> > version > > > > > > >> > > > > >> > > > > > >> > > 1.2.0.RC2. Voting will start now > > > > > (Wednesday, > > > > > > >> May > > > > > > >> > > 2nd) > > > > > > >> > > > > and > > > > > > >> > > > > >> > end > > > > > > >> > > > > >> > > at > > > > > > >> > > > > >> > > > > > >> 12:50 PM > > > > > > >> > > > > >> > > > > > >> > > PDT, Sunday, May 6th. > > > > > > >> > > > > >> > > > > > >> > > > > > > > > >> > > > > >> > > > > > >> > > Link to release notes: > > > > > > >> > > > > >> > > > > > >> > > https://cwiki.apache.org/ > > > > > > >> > confluence/display/MXNET/ > > > > > > >> > > > > >> > > > > > >> > > Apache+MXNet+%28incubating%29+ > > > > > > >> > 1.2.0+Release+Notes > > > > > > >> > > > > >> > > > > > >> > > > > > > > > >> > > > > >> > > > > > >> > > Link to release candidate > 1.2.0.rc2: > > > > > > >> > > > > >> > > > > > >> > > > > > > > > >> > > > > https://github.com/apache/ > incubator-mxnet/releases/tag/ > > > > > > >> > > > > >> > > > 1.2.0.rc2 > > > > > > >> > > > > >> > > > > > >> > > > > > > > > >> > > > > >> > > > > > >> > > Voting results for 1.2.0.rc2: > > > > > > >> > > > > >> > > > > > >> > > https://lists.apache.org/ > > thread.html/ > > > > > > >> > > > > >> > > > > ebe561c609a8e32351dfe4aafc8876 > > > > > > >> > > > > >> > > > > > >> > > 199560336472726b58c3455e85@%3C > > > > > > >> > dev.mxnet.apache.org > > > > > > >> > > > %3E > > > > > > >> > > > > >> > > > > > >> > > > > > > > > >> > > > > >> > > > > > >> > > View this page, click on "Build > from > > > > > > Source", > > > > > > >> and > > > > > > >> > > use > > > > > > >> > > > > the > > > > > > >> > > > > >> > > source > > > > > > >> > > > > >> > > > > > code > > > > > > >> > > > > >> > > > > > >> > > obtained from 1.2.0.rc2 tag: > > > > > > >> > > > > >> > > > > > >> > > https://mxnet.incubator. > > > > > > >> > > > apache.org/install/index.html > > > > > > >> > > > > >> > > > > > >> > > > > > > > > >> > > > > >> > > > > > >> > > (Note: The README.md points to the > > > 1.2.0 > > > > > tag > > > > > > >> and > > > > > > >> > > does > > > > > > >> > > > > not > > > > > > >> > > > > >> > work > > > > > > >> > > > > >> > > > at > > > > > > >> > > > > >> > > > > > the > > > > > > >> > > > > >> > > > > > >> > > moment.) > > > > > > >> > > > > >> > > > > > >> > > > > > > > > >> > > > > >> > > > > > >> > > Please remember to test first > before > > > > > voting > > > > > > >> > > > > accordingly: > > > > > > >> > > > > >> > > > > > >> > > > > > > > > >> > > > > >> > > > > > >> > > +1 = approve > > > > > > >> > > > > >> > > > > > >> > > +0 = no opinion > > > > > > >> > > > > >> > > > > > >> > > -1 = disapprove (provide reason) > > > > > > >> > > > > >> > > > > > >> > > > > > > > > >> > > > > >> > > > > > >> > > Anirudh > > > > > > >> > > > > >> > > > > > >> > > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --0000000000005ff1a9056bb3644c--