Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6F2EB200D45 for ; Thu, 9 Nov 2017 06:32:49 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 6DB3C160C01; Thu, 9 Nov 2017 05:32:49 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 64F50160BDA for ; Thu, 9 Nov 2017 06:32:48 +0100 (CET) Received: (qmail 8882 invoked by uid 500); 9 Nov 2017 05:32:47 -0000 Mailing-List: contact dev-help@mxnet.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mxnet.incubator.apache.org Delivered-To: mailing list dev@mxnet.incubator.apache.org Received: (qmail 8869 invoked by uid 99); 9 Nov 2017 05:32:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Nov 2017 05:32:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 781EA1A37CE for ; Thu, 9 Nov 2017 05:32:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.13 X-Spam-Level: *** X-Spam-Status: No, score=3.13 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id O_lazZyardLO for ; Thu, 9 Nov 2017 05:32:43 +0000 (UTC) Received: from mail-qt0-f169.google.com (mail-qt0-f169.google.com [209.85.216.169]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 01BE360F72 for ; Thu, 9 Nov 2017 05:32:43 +0000 (UTC) Received: by mail-qt0-f169.google.com with SMTP id d9so6288655qtd.7 for ; Wed, 08 Nov 2017 21:32:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=h9SwyFCEsqVQr2eP1o/v5GY53w1QsWYeti9b34ZpekA=; b=Xj1vsAVs4GcZrIWRJqsWKICiJebCoXWdP3S4bUiYBt6dkEU4xOBeurwKWPl7YeyP9a Fzz8NTj00AsxNdYvQSwh2ty3+6Klu5nw4NTwgHmt4jI6o7gdHEyZTsFRiP6ksWAB4BZ5 b9+29lSRN3i1TAOQXUAQeUpgtgwr1UzOsvjqNd5BnMupcXFwg1awHYqK16MV3QBoftj5 tUb2PMaX1AanqwYhjqsSa21QApZOIB0Cwxu3hwxPeT5aWb1HBs2aZLmY/pIfqG7K+ZId Z5kXBNz0q6cL4OMxN5AOuXCjU/5ueJBbPKjlZfVzTIqAvTxPc/4UG3BGHnFvpUDL2Ysk Jz3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=h9SwyFCEsqVQr2eP1o/v5GY53w1QsWYeti9b34ZpekA=; b=a3G9U3Yfoxx7P/5mmO6Nelp/krfWfkPQhTm3pMeQ9feKrTS5e5uqOFYT/Uh9kB6+rX B/5cHS5Oof9B75E2xhtm3aZX9zS7RDw+yP0q6oGrxp3GKe2edEE1IJXdWv9KfBSnFsTH lMxDXMMI+aI7mO86nPt4+usZURi8M7CjuBNHLDdpyf3wy7bihAs+p/syJKbjUGfxckkt 4uSbyahRO5gep4yz+W71jBkWjrTU6gnMQuYiuBb/HaKSHWhXIdh9xr7CiTHAabs906Aw GVFRTz4RVbBv/B5FpOQORJT9b67BCHi6kS7f7rPVyqdYdkMb69fBaD1o8hzDK9jWgKPE BtYg== X-Gm-Message-State: AJaThX75a8GOCO+MYM0LXL2hSO1MO93i56cLPDeLNwQB7bUmLIVTYrDR AC0byU2cTRX7P6Z2SluOPyZfG2XUGtu1iZSX1rk= X-Google-Smtp-Source: ABhQp+REhENKOPdMlTadDK6EJWx26wLVDzdfhPX3NBTMqB3omWQ6THCHcD8RIA6EFgz5c+sEuc8Yi8FnT5SZ4aHlUR8= X-Received: by 10.200.44.9 with SMTP id d9mr4641320qta.173.1510205561634; Wed, 08 Nov 2017 21:32:41 -0800 (PST) MIME-Version: 1.0 Received: by 10.140.38.180 with HTTP; Wed, 8 Nov 2017 21:32:20 -0800 (PST) In-Reply-To: References: From: sandeep krishnamurthy Date: Wed, 8 Nov 2017 21:32:20 -0800 Message-ID: Subject: Re: [Proposal] Stabilizing Apache MXNet CI build system To: dev@mxnet.incubator.apache.org Content-Type: multipart/alternative; boundary="001a1141bda20a04a5055d862202" archived-at: Thu, 09 Nov 2017 05:32:49 -0000 --001a1141bda20a04a5055d862202 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Good work Meghna and thanks to community members for participating in the discussion and providing valuable inputs. Yes please share the document again and ask for vote and more broader inputs. On Wed, Nov 8, 2017 at 2:43 PM, Chris Olivier wrote= : > +1 > > On Wed, Nov 8, 2017 at 2:40 PM Meghna Baijal > wrote: > > > Thanks for the active discussion on the document for the new CI for > MXNet. > > Now that many of you have reviewed it, do you think I should start a vo= te > > on which framework the community wants to move forward with ? > > > > Thanks, > > Meghna > > > > On Mon, Nov 6, 2017 at 6:59 PM, Chris Olivier > > wrote: > > > > > After a decision is reached, i am willing to add tasks to Apache MXNe= t > > JIRA > > > > > > On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy < > > pedro.larroy.lists@gmail.com > > > > > > > wrote: > > > > > > > Thanks for setting up the document guys, looks like a solid basis t= o > > > > start to work on! > > > > > > > > Marco, Kellen and I have already added some comments. > > > > > > > > Pedro > > > > > > > > > > > > On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal > > > > wrote: > > > > > Kellen, Thank you for your comments in the doc. > > > > > Sure Steffen, I will continue to merge everyone=E2=80=99s comment= s into the > > doc > > > > and > > > > > work with Pedro to finalize it. > > > > > And then we can vote on the options. > > > > > > > > > > Thanks, > > > > > Meghna Baijal > > > > > > > > > > > > > > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel < > > > steffenrochel@gmail.com> > > > > > wrote: > > > > > > > > > >> Sandeep and Meghna have been working in background collecting > input > > > and > > > > >> preparing a doc. I suggest to drive discussion forward and would > > like > > > to > > > > >> ask everybody to contribute to > > > > >> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxD= k > > > > >> dlavUDASzUmLjk/edit?usp=3Dsharing > > > > >> > > > > >> Lets converge on requirements and architecture, so we can move > > forward > > > > with > > > > >> implementation. > > > > >> > > > > >> I would like to suggest for Pedro and Meghna to lead the > discussion > > > and > > > > >> help to resolve suggestions. > > > > >> > > > > >> I assume we need a vote once we are converged on a good draft to > > call > > > > it a > > > > >> plan and move forward with implementation. As we all are unhappy > > with > > > > the > > > > >> current CI situation I would also suggest a phased approach, so = we > > can > > > > get > > > > >> back to reliable and efficient basic CI quickly and add advanced > > > > >> capabilities over time. > > > > >> > > > > >> Steffen > > > > >> > > > > >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland < > > > > >> kellen.sunderland@gmail.com> wrote: > > > > >> > > > > >> > Hey Henri, I think that's what a few of us are advocating. > > Running > > > a > > > > set > > > > >> > of quick tests as part of the PR process, and then a more > detailed > > > > >> > regression test suite periodically (say every 4 hours). This > fits > > > > nicely > > > > >> > into a tagging or 2 branch development system. Commits will b= e > > > tagged > > > > >> (or > > > > >> > merged into a stable branch) as soon as they pass the detailed > > > > regression > > > > >> > testing. > > > > >> > > > > > >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen wrote: > > > > >> > > > > > >> > > Random question - can the CI be split such that the Apache C= I > is > > > > doing > > > > >> a > > > > >> > > basic set of checks on that hardware, and is hooked to a PR, > > while > > > > >> there > > > > >> > is > > > > >> > > a larger "Is trunk good for release?" test that is running > > > > periodically > > > > >> > > rather than on every PR? > > > > >> > > > > > > >> > > ie: do we need each PR to be run on varied hardware, or can = we > > > have > > > > >> this > > > > >> > > two tier approach? > > > > >> > > > > > > >> > > Hen > > > > >> > > > > > > >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > > > >> > > sandeep.krishna98@gmail.com> wrote: > > > > >> > > > > > > >> > > > Hello all, > > > > >> > > > > > > > >> > > > I am hereby opening up a discussion thread on how we can > > > stabilize > > > > >> > Apache > > > > >> > > > MXNet CI build system. > > > > >> > > > > > > > >> > > > Problems: > > > > >> > > > > > > > >> > > > =3D=3D=3D=3D=3D=3D=3D=3D > > > > >> > > > > > > > >> > > > Recently, we have seen following issues with Apache MXNet = CI > > > build > > > > >> > > systems: > > > > >> > > > > > > > >> > > > 1. Apache Jenkins master is overloaded and we see issue= s > > > like - > > > > >> > unable > > > > >> > > > to trigger builds, difficult to load and view the blue > > ocean > > > > and > > > > >> > other > > > > >> > > > Jenkins build status page. > > > > >> > > > 2. We are generating too many request/interaction on > Apache > > > > Infra > > > > >> > > team. > > > > >> > > > 1. Addition/deletion of new slave: Caused from scali= ng > > > > >> activity, > > > > >> > > > recycling, troubleshooting or any actions leading to > > > change > > > > of > > > > >> > > slave > > > > >> > > > machines. > > > > >> > > > 2. Plugins / other Jenkins Master configurations. > > > > >> > > > 3. Experimentation on CI pipelines. > > > > >> > > > 3. Harder to debug and resolve issues - Since access to > > > master > > > > and > > > > >> > > slave > > > > >> > > > is not with the same community, it requires Infra and > > > > community to > > > > >> > > dive > > > > >> > > > deep together on all action items. > > > > >> > > > > > > > >> > > > Possible Solutions: > > > > >> > > > > > > > >> > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > >> > > > > > > > >> > > > 1. Can we set up a separate Jenkins CI build system for > > > Apache > > > > >> MXNet > > > > >> > > > outside Apache Infra? > > > > >> > > > 2. Can we have a separate Jenkins Master in Apache Infr= a > > for > > > > >> MXNet? > > > > >> > > > 3. Review design of current setup, refine and fill the > > gaps. > > > > >> > > > > > > > >> > > > @ Mentors/Infra team/Community: > > > > >> > > > > > > > >> > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D > > > > >> > > > > > > > >> > > > Please provide your suggestions on how we can proceed > further > > > and > > > > >> work > > > > >> > on > > > > >> > > > stabilizing the CI build systems for MXNet. > > > > >> > > > > > > > >> > > > Also, if the community decides on separate Jenkins CI buil= d > > > > system, > > > > >> > what > > > > >> > > > important points should be taken care of apart from the > below: > > > > >> > > > > > > > >> > > > 1. Community being able to access the build page for > build > > > > >> statuses. > > > > >> > > > 2. Committers being able to login with apache > credentials. > > > > >> > > > 3. Hook setup from apache/incubator-mxnet repo to Jenki= ns > > > > master. > > > > >> > > > > > > > >> > > > > > > > >> > > > Irrespective of the solution we come up, I think we should > > > > initiate a > > > > >> > > > technical design discussion on how to setup the CI build > > system. > > > > >> > > Probably 1 > > > > >> > > > or 2 pager documents with the architecture and review with > > Infra > > > > and > > > > >> > > > community members. > > > > >> > > > > > > > >> > > > ***There were few proposal and discussion on the slack > > channel, > > > to > > > > >> > reach > > > > >> > > > wider community members, moving that discussion formally t= o > > this > > > > >> list. > > > > >> > > > > > > > >> > > > > > > > >> > > > My Proposal: Option 1 - Set up separate Jenkins CI build > > system. > > > > >> > > > > > > > >> > > > Thanks, > > > > >> > > > > > > > >> > > > Sandeep > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > -- > > > > >> > > > Sandeep Krishnamurthy > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > --=20 Sandeep Krishnamurthy --001a1141bda20a04a5055d862202--