Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 97A15200D29 for ; Thu, 26 Oct 2017 15:49:50 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 960A2160BF2; Thu, 26 Oct 2017 13:49:50 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8D3061609E8 for ; Thu, 26 Oct 2017 15:49:49 +0200 (CEST) Received: (qmail 25697 invoked by uid 500); 26 Oct 2017 13:49:48 -0000 Mailing-List: contact dev-help@mxnet.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mxnet.incubator.apache.org Delivered-To: mailing list dev@mxnet.incubator.apache.org Received: (qmail 25685 invoked by uid 99); 26 Oct 2017 13:49:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Oct 2017 13:49:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 8185B1807F8 for ; Thu, 26 Oct 2017 13:49:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.38 X-Spam-Level: *** X-Spam-Status: No, score=3.38 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id ZrmTCK1jedpv for ; Thu, 26 Oct 2017 13:49:41 +0000 (UTC) Received: from mail-lf0-f48.google.com (mail-lf0-f48.google.com [209.85.215.48]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 115D45FBF6 for ; Thu, 26 Oct 2017 13:49:41 +0000 (UTC) Received: by mail-lf0-f48.google.com with SMTP id r129so3824917lff.8 for ; Thu, 26 Oct 2017 06:49:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=tEBOowVl2fhpifMVgKkCL3ea5ODwD6IoKgkqFt+Ne5Y=; b=VT2YYRZNjkaHvVdhep/2xTjBsH/CnWZR2OZxwtLJ+8NnJ36fRlkE1vHKVrr1cn+gOB 7sFGgc6ipMFuu46N/0W4L9W0q5y1GfjTMuvUXvSwo2t4mmTQmOyR/PddHvBX4MTvTfCV 6jdVct7MEfqonxduULZtSD/S5O53MPcO2bURXdJodprrbci1BNURYwVbCq89dT8dnMYi mpkKd3q+cocQY2csSYjTagfLQSdF468LJu67YpgGUlzeCfTJfJBzl7hn6ZPm6m2h55nZ xHIDexpivThvTVB2oWH11nuEF8MUHNV/LSpAv0xIGKneImYFcjSw0faOy31XbWp/iIeQ ggNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=tEBOowVl2fhpifMVgKkCL3ea5ODwD6IoKgkqFt+Ne5Y=; b=T9GTH6maF4iKCPsVBPVUmBcyUGIgSSI/qJlMJdvw36Y7tjNDjdUdvwPas6CbGiMWjX QIQPKmZ+blJ2WX9y5vce7uipcbKmy9Ebk+tCpO9z9lHtqY3KnoGAGFahvySBVMCex6nb LkzHr9/fXVxzBtkmbN4CEfVhjjp50IYIzX3a6R9fiSuPH/Rxxrj3wIQKT44kHFhYr4fw elb/ECAqvYXyDezE7WCrkwlHSfIt2kga1L4YdR6wM7PLjNFqZRVu+7EjQPaFrBGPKQor /gK//+zVgdI39+Nn31cv410i30QTH3IxpORpaF7Iw+/e26g8BADa8/W97On547wxPXJO 2Dng== X-Gm-Message-State: AMCzsaXmIM392/k+VmDMqoCqG+G+RIyPtiEDpBJs/p9Xz/vovcQ6BbZD w7xbn9VqL2wTbkQb78epKrsygJ1KgVsYyFIcXOhU+Q== X-Google-Smtp-Source: ABhQp+Sh8asW/PVuG3ITKFxO8YFU5Pg16tIVS4Xe2H9R+pnm4Kvb3Cvwbz1Nhi8e/nFQ3Viqqp2xNZtPABpAJlRt0og= X-Received: by 10.46.82.144 with SMTP id n16mr9806644lje.50.1509025779180; Thu, 26 Oct 2017 06:49:39 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.158.7 with HTTP; Thu, 26 Oct 2017 06:49:38 -0700 (PDT) In-Reply-To: References: From: Pedro Larroy Date: Thu, 26 Oct 2017 15:49:38 +0200 Message-ID: Subject: Re: [Proposal] Stabilizing Apache MXNet CI build system To: dev@mxnet.incubator.apache.org Content-Type: multipart/alternative; boundary="001a113cc7348668d7055c737110" archived-at: Thu, 26 Oct 2017 13:49:50 -0000 --001a113cc7348668d7055c737110 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks for your input guys, I think we are all on a good track to get this fixed. I'm confident that Meghna and Marco are going to drive this to success. We are collecting ideas and requirements for the document on how we will revamp the testing infrastructure. My only question right now is where to store this document to collaborate. I don't seem to have permissions in confluence to edit the wiki: https://cwiki.apache.org/confluence/display/MXNET/Continuous+Integration Should we otherwise use a shared google doc or a github wiki or how? Please advice. Pedro. On Thu, Oct 26, 2017 at 8:14 AM, Meghna Baijal wrote: > Thanks Sandeep for driving this discussion. I am also in contact with Ped= ro > and his team to include their requirements. > And thank you Sebastian, I will let you know! > > Meghna > > On Wed, Oct 25, 2017 at 11:05 PM, Sebastian > wrote: > > > @meghana @pedro let me know if you need someone with a mentor hat to op= en > > tickets or send mail to infra, happy to help here. > > > > Best, > > Sebastian > > > > > > On 25.10.2017 23:18, sandeep krishnamurthy wrote: > > > >> Thank you, everyone, for the discussion, proposal, and the vote. > >> > >> Here majority community members see current CI system for Apache MXNet > is > >> having issues in scaling and diverse test environments. And the common > >> suggestion is to have a separate CI setup for Apache MXNet. > >> > >> Following are the next steps: > >> > >> 1. Meghana proposed she would like to take the lead on this and come u= p > >> with an initial tech design write up covering requirements, use-cases, > >> alternate solutions and a proposed solution on how we could set up the > CI > >> system for MXNet. > >> 2. This tech design will be reviewed in the community and following > that, > >> collaborate with Infra team and mentors to complete setup in the > >> integration of the new system with Repo and Website and more. > >> > >> @Pedro Larry - We should sync up on understanding how we can unify the > set > >> up you have for various devices and the new set up being proposed and > >> built. Ideally, we should have a unified CI setup for the project > >> accessible to the community. > >> > >> Regards, > >> Sandeep > >> > >> On Mon, Oct 23, 2017 at 7:29 AM, Pedro Larroy < > >> pedro.larroy.lists@gmail.com> > >> wrote: > >> > >> +1 > >>> > >>> We (with Kellen and Marco) are already working on a CI system that > >>> verifies > >>> MXNet on devices, so far a work in progress, but at least we are > checking > >>> that the build is sane on Android, different arm flavors and ubuntu, > also > >>> building PRs. So far we are still working on having the unit tests pa= ss > >>> on > >>> some architectures like Jetson TX2 and ARM / Raspberry PI. > >>> > >>> http://ci.mxnet.amazon-ml.com/ > >>> > >>> Agree with Steffen on creating a document with requirements and high > >>> level > >>> architecture. Also I would like to have quicker feedback and as we > >>> discussed before, saner unit tests. I think there's a big and > nontrivial > >>> amount of effort required here. > >>> > >>> Pedro. > >>> > >>> On Mon, Oct 23, 2017 at 6:43 AM, Steffen Rochel < > steffenrochel@gmail.com > >>> > > >>> wrote: > >>> > >>> +1 > >>>> I support Option 1 - Set up separate Jenkins CI build system. While > the > >>>> Apache service is appropriate for some projects, our experience over > the > >>>> last 6 months has not been meeting the needs of the MXNet (incubatin= g) > >>>> project. AWS has been and will continue provide resources for such > >>>> > >>> project. > >>> > >>>> Agree we should create a document summarizing the requirements and > high > >>>> level architecture, which should answer the question of Jenkins or > >>>> alternative. > >>>> > >>>> Steffen > >>>> > >>>> On Sat, Oct 21, 2017 at 6:51 PM shiwen hu > >>>> wrote: > >>>> > >>>> +1 > >>>>> > >>>>> > >>>>> 2017-10-21 9:48 GMT+08:00 Chris Olivier : > >>>>> > >>>>> Ok, just looking for anything that can cut a task out if possible. = I > >>>>>> > >>>>> do > >>> > >>>> support not using Apache Jenkins server anyMore =E2=80=94 it=E2=80= =99s really not > >>>>>> > >>>>> been > >>> > >>>> working out for various reasons. But having a person full time is > >>>>>> something that Steffen would have to address, I imagine. > >>>>>> > >>>>>> On Fri, Oct 20, 2017 at 6:03 PM Mu Li wrote: > >>>>>> > >>>>>> I didn't see the clear advantage of CodePipline over pure jenkins, > >>>>>>> > >>>>>> because > >>>>>> > >>>>>>> we don't need to deploy here. > >>>>>>> > >>>>>>> On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier < > >>>>>>> > >>>>>> cjolivier01@gmail.com> > >>>> > >>>>> wrote: > >>>>>>> > >>>>>>> CodePipeline, then. You can point it to Jenkins instances. > >>>>>>>> > >>>>>>>> > >>>>>>>> On Fri, Oct 20, 2017 at 4:49 PM Mu Li > >>>>>>>> > >>>>>>> wrote: > >>> > >>>> > >>>>>>>> AWS CodeBuild is not an option. It doesn't support GPU > >>>>>>>>> > >>>>>>>> instances, > >>> > >>>> mac > >>>>> > >>>>>> os > >>>>>>> > >>>>>>>> x, > >>>>>>>> > >>>>>>>>> and windows. Not even mention the edge devices. > >>>>>>>>> > >>>>>>>>> On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier < > >>>>>>>>> > >>>>>>>> cjolivier01@gmail.com> > >>>>>> > >>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> Why don;t we look into fully managed AWS CodeBuild? It > >>>>>>>>>> > >>>>>>>>> maintains > >>>> > >>>>> everything. It's also compatible with Jenkins. > >>>>>>>>>> > >>>>>>>>>> On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen < > >>>>>>>>>> > >>>>>>>>> tqchen@cs.washington.edu > >>>>>>> > >>>>>>>> > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> +1 > >>>>>>>>>>> > >>>>>>>>>>> Tianqi > >>>>>>>>>>> On Fri, Oct 20, 2017 at 1:39 PM Mu Li > >>>>>>>>>>> > >>>>>>>>>> wrote: > >>>>>> > >>>>>>> > >>>>>>>>>>> +1 > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> It seems that the Apache CI is quite overloaded these > >>>>>>>>>>>> > >>>>>>>>>>> days, > >>> > >>>> and > >>>>> > >>>>>> MXNet's > >>>>>>>>> > >>>>>>>>>> CI > >>>>>>>>>>> > >>>>>>>>>>>> pipeline is too complex to run there. In addition, we may > >>>>>>>>>>>> > >>>>>>>>>>> need > >>>>> > >>>>>> to > >>>>>> > >>>>>>> add > >>>>>>>> > >>>>>>>>> more > >>>>>>>>>>> > >>>>>>>>>>>> devices, e.g. macpro and rasbperry pi, into the server, > >>>>>>>>>>>> > >>>>>>>>>>> and > >>> > >>>> more > >>>>>> > >>>>>>> tasks > >>>>>>>>> > >>>>>>>>>> such > >>>>>>>>>>> > >>>>>>>>>>>> as pip build. It means a lot of requests to the Infra > >>>>>>>>>>>> > >>>>>>>>>>> team. > >>> > >>>> > >>>>>>>>>>>> We can reuse our previous Jenkins server at > >>>>>>>>>>>> > >>>>>>>>>>> http://ci.mxnet.io/. > >>>>>> > >>>>>>> But > >>>>>>>> > >>>>>>>>> we > >>>>>>>>>> > >>>>>>>>>>> probably need a dedicate developer to maintain it. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > >>>>>>>>>>>> sandeep.krishna98@gmail.com> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Hello all, > >>>>>>>>>>>>> > >>>>>>>>>>>>> I am hereby opening up a discussion thread on how we > >>>>>>>>>>>>> > >>>>>>>>>>>> can > >>> > >>>> stabilize > >>>>>>>> > >>>>>>>>> Apache > >>>>>>>>>>> > >>>>>>>>>>>> MXNet CI build system. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Problems: > >>>>>>>>>>>>> > >>>>>>>>>>>>> =3D=3D=3D=3D=3D=3D=3D=3D > >>>>>>>>>>>>> > >>>>>>>>>>>>> Recently, we have seen following issues with Apache > >>>>>>>>>>>>> > >>>>>>>>>>>> MXNet > >>> > >>>> CI > >>>>> > >>>>>> build > >>>>>>>> > >>>>>>>>> systems: > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> 1. Apache Jenkins master is overloaded and we see > >>>>>>>>>>>>> > >>>>>>>>>>>> issues > >>>> > >>>>> like > >>>>>>> > >>>>>>>> - > >>>>>>>> > >>>>>>>>> unable > >>>>>>>>>>> > >>>>>>>>>>>> to trigger builds, difficult to load and view the > >>>>>>>>>>>>> > >>>>>>>>>>>> blue > >>> > >>>> ocean > >>>>>> > >>>>>>> and > >>>>>>>> > >>>>>>>>> other > >>>>>>>>>>> > >>>>>>>>>>>> Jenkins build status page. > >>>>>>>>>>>>> 2. We are generating too many request/interaction on > >>>>>>>>>>>>> > >>>>>>>>>>>> Apache > >>>>>> > >>>>>>> Infra > >>>>>>>>> > >>>>>>>>>> team. > >>>>>>>>>>>> > >>>>>>>>>>>>> 1. Addition/deletion of new slave: Caused from > >>>>>>>>>>>>> > >>>>>>>>>>>> scaling > >>>>> > >>>>>> activity, > >>>>>>>>>> > >>>>>>>>>>> recycling, troubleshooting or any actions leading > >>>>>>>>>>>>> > >>>>>>>>>>>> to > >>>> > >>>>> change > >>>>>>> > >>>>>>>> of > >>>>>>>>> > >>>>>>>>>> slave > >>>>>>>>>>>> > >>>>>>>>>>>>> machines. > >>>>>>>>>>>>> 2. Plugins / other Jenkins Master configurations. > >>>>>>>>>>>>> 3. Experimentation on CI pipelines. > >>>>>>>>>>>>> 3. Harder to debug and resolve issues - Since access > >>>>>>>>>>>>> > >>>>>>>>>>>> to > >>>> > >>>>> master > >>>>>>> > >>>>>>>> and > >>>>>>>>> > >>>>>>>>>> slave > >>>>>>>>>>>> > >>>>>>>>>>>>> is not with the same community, it requires Infra > >>>>>>>>>>>>> > >>>>>>>>>>>> and > >>> > >>>> community > >>>>>>>> > >>>>>>>>> to > >>>>>>>>> > >>>>>>>>>> dive > >>>>>>>>>>>> > >>>>>>>>>>>>> deep together on all action items. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Possible Solutions: > >>>>>>>>>>>>> > >>>>>>>>>>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >>>>>>>>>>>>> > >>>>>>>>>>>>> 1. Can we set up a separate Jenkins CI build system > >>>>>>>>>>>>> > >>>>>>>>>>>> for > >>>> > >>>>> Apache > >>>>>>> > >>>>>>>> MXNet > >>>>>>>>>> > >>>>>>>>>>> outside Apache Infra? > >>>>>>>>>>>>> 2. Can we have a separate Jenkins Master in Apache > >>>>>>>>>>>>> > >>>>>>>>>>>> Infra > >>>> > >>>>> for > >>>>>> > >>>>>>> MXNet? > >>>>>>>>>> > >>>>>>>>>>> 3. Review design of current setup, refine and fill > >>>>>>>>>>>>> > >>>>>>>>>>>> the > >>> > >>>> gaps. > >>>>>> > >>>>>>> > >>>>>>>>>>>>> @ Mentors/Infra team/Community: > >>>>>>>>>>>>> > >>>>>>>>>>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D > >>>>>>>>>>>>> > >>>>>>>>>>>>> Please provide your suggestions on how we can proceed > >>>>>>>>>>>>> > >>>>>>>>>>>> further > >>>>> > >>>>>> and > >>>>>>> > >>>>>>>> work > >>>>>>>>>> > >>>>>>>>>>> on > >>>>>>>>>>> > >>>>>>>>>>>> stabilizing the CI build systems for MXNet. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Also, if the community decides on separate Jenkins CI > >>>>>>>>>>>>> > >>>>>>>>>>>> build > >>>> > >>>>> system, > >>>>>>>> > >>>>>>>>> what > >>>>>>>>>>> > >>>>>>>>>>>> important points should be taken care of apart from the > >>>>>>>>>>>>> > >>>>>>>>>>>> below: > >>>>>> > >>>>>>> > >>>>>>>>>>>>> 1. Community being able to access the build page for > >>>>>>>>>>>>> > >>>>>>>>>>>> build > >>>>> > >>>>>> statuses. > >>>>>>>>>> > >>>>>>>>>>> 2. Committers being able to login with apache > >>>>>>>>>>>>> > >>>>>>>>>>>> credentials. > >>>>> > >>>>>> 3. Hook setup from apache/incubator-mxnet repo to > >>>>>>>>>>>>> > >>>>>>>>>>>> Jenkins > >>>>> > >>>>>> master. > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Irrespective of the solution we come up, I think we > >>>>>>>>>>>>> > >>>>>>>>>>>> should > >>>> > >>>>> initiate a > >>>>>>>>> > >>>>>>>>>> technical design discussion on how to setup the CI > >>>>>>>>>>>>> > >>>>>>>>>>>> build > >>> > >>>> system. > >>>>>>> > >>>>>>>> Probably 1 > >>>>>>>>>>>> > >>>>>>>>>>>>> or 2 pager documents with the architecture and review > >>>>>>>>>>>>> > >>>>>>>>>>>> with > >>>> > >>>>> Infra > >>>>>>> > >>>>>>>> and > >>>>>>>>> > >>>>>>>>>> community members. > >>>>>>>>>>>>> > >>>>>>>>>>>>> ***There were few proposal and discussion on the slack > >>>>>>>>>>>>> > >>>>>>>>>>>> channel, > >>>>>> > >>>>>>> to > >>>>>>>> > >>>>>>>>> reach > >>>>>>>>>>> > >>>>>>>>>>>> wider community members, moving that discussion > >>>>>>>>>>>>> > >>>>>>>>>>>> formally > >>> > >>>> to > >>>> > >>>>> this > >>>>>>> > >>>>>>>> list. > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> My Proposal: Option 1 - Set up separate Jenkins CI > >>>>>>>>>>>>> > >>>>>>>>>>>> build > >>> > >>>> system. > >>>>>>> > >>>>>>>> > >>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Sandeep > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> -- > >>>>>>>>>>>>> Sandeep Krishnamurthy > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > >> > >> > --001a113cc7348668d7055c737110--