Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 13E74200D37 for ; Thu, 9 Nov 2017 21:23:10 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 1254D160BEF; Thu, 9 Nov 2017 20:23:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 09B151609C8 for ; Thu, 9 Nov 2017 21:23:08 +0100 (CET) Received: (qmail 38044 invoked by uid 500); 9 Nov 2017 20:23:08 -0000 Mailing-List: contact dev-help@mxnet.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mxnet.incubator.apache.org Delivered-To: mailing list dev@mxnet.incubator.apache.org Received: (qmail 38029 invoked by uid 99); 9 Nov 2017 20:23:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Nov 2017 20:23:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 105B5C72D5 for ; Thu, 9 Nov 2017 20:23:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.63 X-Spam-Level: *** X-Spam-Status: No, score=3.63 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id nIL97XB66azN for ; Thu, 9 Nov 2017 20:23:05 +0000 (UTC) Received: from mail-lf0-f45.google.com (mail-lf0-f45.google.com [209.85.215.45]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 9BA095F4E5 for ; Thu, 9 Nov 2017 20:23:04 +0000 (UTC) Received: by mail-lf0-f45.google.com with SMTP id 90so8637573lfs.13 for ; Thu, 09 Nov 2017 12:23:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=QkVpdVVSOa/uiUVlbqtE53p6gLRD2hg81GBwvaUlU0o=; b=kQcmsxVmWroxC+COFzGZyWroE5iGeKA9Vc3PBUbB4M1s2O2k0TwXRq/RqSA+86Ac66 0bsyaDGNbwggiMfhSMdqtBcRVHsTaAW1BpgCpWhaFBEk98XSM3NwBDYp+qv6zZ1zrdfU UvYOJlgK5/focGPEXJHwO7UsIcuRkkQUicAICRCERpDIcFPj5K9xTm9ipIL3O9kyt0Gp ma6ts2H2UA9U6v5vRtUPDRx++ZOOkxQE0uNYRbET0p8GC6IjhMHGn/mAA0qv1GonpzH0 n/0xL6Lik56zuXhRu4n6li35POLZmmNvmHZmWZQ8oDHlPjqxOL6TF1l6ZpkcmRH+hvk5 XUnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=QkVpdVVSOa/uiUVlbqtE53p6gLRD2hg81GBwvaUlU0o=; b=hfzl4yEExUP8tNWWJ4WO28fGnvusgAKsIinkR3uHv8jTTgwUkWXrtV30Jyzm512b4X PifxFGpYgZl9PPBqyDjMr8zqJoYYcU9DHWaAxts6mB0XnZk4DSJ1nJT1htm6YpPmw6o1 Z3egi2Ys8f6wsrh6VYH0tEbXzXkOHqcU9tIANc0VBZY7AshAT/5MbyhL97mejeLMBDS6 Au8z91QHdKaFg3oEJNrheRZMlvHshl8En3yQBkh9kfn1sGo24aEWF5/SLL9dri/+bx9H dBt/PSVCRzexjgDLsOXRKu/qw+y1e+0eVOcWvo01GpQLRrveYNbrnhGI4iSBmp1lxhnB De1g== X-Gm-Message-State: AJaThX7sC7ZaMIYm0rhwnsaL+qv44BW0H0Y27KfEirmweSE54vDERgNw Ng1t7piDvVV5i+b4CvhmX3gLQ5kENGhHFav0ouc= X-Google-Smtp-Source: AGs4zMbg803PGgFHu55Zwx30HfJ8RrL8vEm8JnZwTqqVI7twR9pTN1bhieEMzw8hzmSCK6np9ZfWiC4/v73XNiisHA8= X-Received: by 10.25.227.77 with SMTP id c13mr669596lfk.82.1510258982826; Thu, 09 Nov 2017 12:23:02 -0800 (PST) MIME-Version: 1.0 Received: by 10.25.42.152 with HTTP; Thu, 9 Nov 2017 12:23:01 -0800 (PST) In-Reply-To: <612493C1-4BB5-496C-BD5C-5ABDD4FC5632@amazon.com> References: <612493C1-4BB5-496C-BD5C-5ABDD4FC5632@amazon.com> From: Meghna Baijal Date: Thu, 9 Nov 2017 12:23:01 -0800 Message-ID: Subject: Re: [Proposal] Stabilizing Apache MXNet CI build system To: dev@mxnet.incubator.apache.org Content-Type: multipart/alternative; boundary="94eb2c1cbb3e30b6c8055d929241" archived-at: Thu, 09 Nov 2017 20:23:10 -0000 --94eb2c1cbb3e30b6c8055d929241 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Pedro, I created a row for BuildBot in the doc. Do you want to add some pros and cons about it? It would be good to have all this information collected in one place. Meghna On Thu, Nov 9, 2017 at 4:40 AM, Larroy, Pedro wrote: > Thanks a lot for the document and leading the discussion. > > Does anybody have experience with a build system other than Jenkins? In > the document we mention Teamcity as a possible option, and there=E2=80=99= s also the > second leading open source CI tool =E2=80=9CBuildbot=E2=80=9D which is no= t mentioned. > > I=E2=80=99m not sure if we have strong evidence to have an informed decis= ion about > using something other than Jenkins, also from the document I get that the > negatives of Jenkins are pretty minor compared to the other frameworks. > > I would be interested to read if somebody has used any other framework in > depth and is willing to vote against using Jenkins so we can all do an > informed vote. > > I don=E2=80=99t feel comfortable voting for Jenkins because is the only o= ne I know > as well. > > Kind regards. > -- > > Pedro > > On 08/11/17 23:41, "Meghna Baijal" wrote: > > Thanks for the active discussion on the document for the new CI for > MXNet. > Now that many of you have reviewed it, do you think I should start a > vote > on which framework the community wants to move forward with ? > > Thanks, > Meghna > > On Mon, Nov 6, 2017 at 6:59 PM, Chris Olivier > wrote: > > > After a decision is reached, i am willing to add tasks to Apache > MXNet JIRA > > > > On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy < > pedro.larroy.lists@gmail.com > > > > > wrote: > > > > > Thanks for setting up the document guys, looks like a solid basis > to > > > start to work on! > > > > > > Marco, Kellen and I have already added some comments. > > > > > > Pedro > > > > > > > > > On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal > > > wrote: > > > > Kellen, Thank you for your comments in the doc. > > > > Sure Steffen, I will continue to merge everyone=E2=80=99s comme= nts into > the doc > > > and > > > > work with Pedro to finalize it. > > > > And then we can vote on the options. > > > > > > > > Thanks, > > > > Meghna Baijal > > > > > > > > > > > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel < > > steffenrochel@gmail.com> > > > > wrote: > > > > > > > >> Sandeep and Meghna have been working in background collecting > input > > and > > > >> preparing a doc. I suggest to drive discussion forward and > would like > > to > > > >> ask everybody to contribute to > > > >> https://docs.google.com/document/d/ > 17PEasQ2VWrXi2Cf7IGZSWGZMawxDk > > > >> dlavUDASzUmLjk/edit?usp=3Dsharing > > > >> > > > >> Lets converge on requirements and architecture, so we can move > forward > > > with > > > >> implementation. > > > >> > > > >> I would like to suggest for Pedro and Meghna to lead the > discussion > > and > > > >> help to resolve suggestions. > > > >> > > > >> I assume we need a vote once we are converged on a good draft > to call > > > it a > > > >> plan and move forward with implementation. As we all are > unhappy with > > > the > > > >> current CI situation I would also suggest a phased approach, s= o > we can > > > get > > > >> back to reliable and efficient basic CI quickly and add advanc= ed > > > >> capabilities over time. > > > >> > > > >> Steffen > > > >> > > > >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland < > > > >> kellen.sunderland@gmail.com> wrote: > > > >> > > > >> > Hey Henri, I think that's what a few of us are advocating. > Running > > a > > > set > > > >> > of quick tests as part of the PR process, and then a more > detailed > > > >> > regression test suite periodically (say every 4 hours). This > fits > > > nicely > > > >> > into a tagging or 2 branch development system. Commits will > be > > tagged > > > >> (or > > > >> > merged into a stable branch) as soon as they pass the detail= ed > > > regression > > > >> > testing. > > > >> > > > > >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen > wrote: > > > >> > > > > >> > > Random question - can the CI be split such that the Apache > CI is > > > doing > > > >> a > > > >> > > basic set of checks on that hardware, and is hooked to a > PR, while > > > >> there > > > >> > is > > > >> > > a larger "Is trunk good for release?" test that is running > > > periodically > > > >> > > rather than on every PR? > > > >> > > > > > >> > > ie: do we need each PR to be run on varied hardware, or ca= n > we > > have > > > >> this > > > >> > > two tier approach? > > > >> > > > > > >> > > Hen > > > >> > > > > > >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > > >> > > sandeep.krishna98@gmail.com> wrote: > > > >> > > > > > >> > > > Hello all, > > > >> > > > > > > >> > > > I am hereby opening up a discussion thread on how we can > > stabilize > > > >> > Apache > > > >> > > > MXNet CI build system. > > > >> > > > > > > >> > > > Problems: > > > >> > > > > > > >> > > > =3D=3D=3D=3D=3D=3D=3D=3D > > > >> > > > > > > >> > > > Recently, we have seen following issues with Apache MXNe= t > CI > > build > > > >> > > systems: > > > >> > > > > > > >> > > > 1. Apache Jenkins master is overloaded and we see > issues > > like - > > > >> > unable > > > >> > > > to trigger builds, difficult to load and view the blu= e > ocean > > > and > > > >> > other > > > >> > > > Jenkins build status page. > > > >> > > > 2. We are generating too many request/interaction on > Apache > > > Infra > > > >> > > team. > > > >> > > > 1. Addition/deletion of new slave: Caused from > scaling > > > >> activity, > > > >> > > > recycling, troubleshooting or any actions leading = to > > change > > > of > > > >> > > slave > > > >> > > > machines. > > > >> > > > 2. Plugins / other Jenkins Master configurations. > > > >> > > > 3. Experimentation on CI pipelines. > > > >> > > > 3. Harder to debug and resolve issues - Since access = to > > master > > > and > > > >> > > slave > > > >> > > > is not with the same community, it requires Infra and > > > community to > > > >> > > dive > > > >> > > > deep together on all action items. > > > >> > > > > > > >> > > > Possible Solutions: > > > >> > > > > > > >> > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >> > > > > > > >> > > > 1. Can we set up a separate Jenkins CI build system f= or > > Apache > > > >> MXNet > > > >> > > > outside Apache Infra? > > > >> > > > 2. Can we have a separate Jenkins Master in Apache > Infra for > > > >> MXNet? > > > >> > > > 3. Review design of current setup, refine and fill th= e > gaps. > > > >> > > > > > > >> > > > @ Mentors/Infra team/Community: > > > >> > > > > > > >> > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D > > > >> > > > > > > >> > > > Please provide your suggestions on how we can proceed > further > > and > > > >> work > > > >> > on > > > >> > > > stabilizing the CI build systems for MXNet. > > > >> > > > > > > >> > > > Also, if the community decides on separate Jenkins CI > build > > > system, > > > >> > what > > > >> > > > important points should be taken care of apart from the > below: > > > >> > > > > > > >> > > > 1. Community being able to access the build page for > build > > > >> statuses. > > > >> > > > 2. Committers being able to login with apache > credentials. > > > >> > > > 3. Hook setup from apache/incubator-mxnet repo to > Jenkins > > > master. > > > >> > > > > > > >> > > > > > > >> > > > Irrespective of the solution we come up, I think we shou= ld > > > initiate a > > > >> > > > technical design discussion on how to setup the CI build > system. > > > >> > > Probably 1 > > > >> > > > or 2 pager documents with the architecture and review > with Infra > > > and > > > >> > > > community members. > > > >> > > > > > > >> > > > ***There were few proposal and discussion on the slack > channel, > > to > > > >> > reach > > > >> > > > wider community members, moving that discussion formally > to this > > > >> list. > > > >> > > > > > > >> > > > > > > >> > > > My Proposal: Option 1 - Set up separate Jenkins CI build > system. > > > >> > > > > > > >> > > > Thanks, > > > >> > > > > > > >> > > > Sandeep > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > -- > > > >> > > > Sandeep Krishnamurthy > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > Amazon Development Center Germany GmbH > Berlin - Dresden - Aachen > main office: Krausenstr. 38, 10117 Berlin > Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger > Ust-ID: DE289237879 > Eingetragen am Amtsgericht Charlottenburg HRB 149173 B > --94eb2c1cbb3e30b6c8055d929241--