Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0E72D200CFC for ; Thu, 28 Sep 2017 15:57:27 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0B44A1609CD; Thu, 28 Sep 2017 13:57:27 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 034651609C2 for ; Thu, 28 Sep 2017 15:57:25 +0200 (CEST) Received: (qmail 13085 invoked by uid 500); 28 Sep 2017 13:57:25 -0000 Mailing-List: contact dev-help@mxnet.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mxnet.incubator.apache.org Delivered-To: mailing list dev@mxnet.incubator.apache.org Received: (qmail 13073 invoked by uid 99); 28 Sep 2017 13:57:24 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Sep 2017 13:57:24 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 5D23C1A39FC for ; Thu, 28 Sep 2017 13:57:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.6 X-Spam-Level: X-Spam-Status: No, score=0.6 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.8, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 30bsBdfE1J7p for ; Thu, 28 Sep 2017 13:57:18 +0000 (UTC) Received: from mail-wr0-f181.google.com (mail-wr0-f181.google.com [209.85.128.181]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id A764C5F29A for ; Thu, 28 Sep 2017 13:57:17 +0000 (UTC) Received: by mail-wr0-f181.google.com with SMTP id a43so2855687wrc.0 for ; Thu, 28 Sep 2017 06:57:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=B53RC1xf0adOJMXedaEht3h1+cnYGrYirTRnL4/NSpI=; b=lax0prlfKjRuBE4+rB3oifa4jKJvsRlZX6YiDwp8EF1N81oR7CE/U0KU6b7+0pR9jx C6dAurY1pM76FDrn59eoGtcK/XfWuw1MtVgNbY2PvURmT01hjy4QyzD8guvgNj8N6y/3 fJeQ0Y71SKT6QaXK4RIR+KQHfew744Nd0BaymVembPp7PT0dMGZJeYiEzsZiQfeyni8l ABdPoLaNYyiy322OimzErGh4TIgwCv8yLfh3E/DJ0cbLc1U00wuQH0OAXNhxhfZRw9YE d9Y64mrizDA0SIxqqomqQ9kjxsxtzEkTxawl7HF8HOO/upzQZrMRAT8zSzFCcZYiDASs PA5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=B53RC1xf0adOJMXedaEht3h1+cnYGrYirTRnL4/NSpI=; b=OwxPO5PZUtND/AfkaWrrh+f7rd5X3nmc4KxVjAYTaZau+ZhdQ6ZlzRlVoBPuobf+yw 42/hfU2pXcqVOEC5cn8mSmfclMh+sXJRSF/y4ytB9mubqWjUofioE/2lrzOmeGgN3Mdf UEpzg0KYwxhpbQTULMcmSXE5ANEasVlJz5v6abXg2uYHTCip4HAJAKF4s3pyKxoSHeCZ 9lEu6D2jLUPznbLmjx2SsVsdkXVe1cqshRPFJG5KoXTJrzmsZpGKdRaO9P0a01UEVMBT BzDOl7hE/7VZFpiKGibf3sgAZqceqJdU+TYaEXiSPmjl7iouqzuyt62KsGXbRmTkxcJF 5tIw== X-Gm-Message-State: AHPjjUi5Egx+cXC2uHlZVTHc6srXqKrVtpYNuX6orhVNM6tpF71IBOVl f8CmaeqE93oTy4f/Gj7ziUBYjFMok1xN+FfdUHsOwQ== X-Google-Smtp-Source: AOwi7QDMiRZAlU+fmBuYKr921awdz4eVbbavr3KmxPdBWTpTRYBYw2OYI5FPP6gIf0TAlcLxubSNDFp+cYSR4s2AwJU= X-Received: by 10.46.21.84 with SMTP id 20mr2073684ljv.168.1506607036246; Thu, 28 Sep 2017 06:57:16 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.232.4 with HTTP; Thu, 28 Sep 2017 06:57:15 -0700 (PDT) In-Reply-To: References: <39AE9373-C107-4D7B-8F62-5D3AC57F95E2@amazon.com> <087FD172-52DE-42F6-B889-BD767E05A795@gmail.com> <5A99919F-2BCB-45EE-8259-01BF1C980CF1@gmail.com> From: Pedro Larroy Date: Thu, 28 Sep 2017 15:57:15 +0200 Message-ID: Subject: Re: Apache MXNet build failures are mostly valid - verify before merge To: dev@mxnet.incubator.apache.org Content-Type: multipart/alternative; boundary="94eb2c1cda1e362def055a404932" archived-at: Thu, 28 Sep 2017 13:57:27 -0000 --94eb2c1cda1e362def055a404932 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Given the cost of running all the test for all the build flavors and architectures I would propose the following: - Have a staging branch were PRs are merged by committers which runs all the integration tests with the appropriate frequency (say nightly). - Have automated fast forwards from the staging branch done automatically by Jenkins when the latest head passes all the tests for a= ll platforms. With this we would always have a stable master branch which is well tested, while being able to adjust the tradeoff between correctness and quick feedback for PRs. Another improvement would be to split the feedback in stages, one would be multi-platform / multi-flavor build which should be around 20 minutes, and then two or more stages of quick tests and extensive tests. And as I explained, we wouldn't need to run extensive tests on every PR, just nightly on staging. What do you think? Pedro. On Thu, Sep 28, 2017 at 2:02 PM, Joern Kottmann wrote: > At Apache OpenNLP we just established among committers that you check > that the status indicator is green before you merge, > and if it wasn't the case then we would ask the committer to take > responsibility and repair things. Works very well our build is never > broken. > > We also strongly prefer if each PR gets reviewed by another committer. > > Overall this works quite well. We don't use any of the protections > against merging, it is important that you can trust each of the > committers not to break things, if there are problems it is better to > resolve them with talking to each other, rather than enforcing green > status checks. > > J=C3=B6rn > > On Thu, Sep 28, 2017 at 8:21 AM, Chris Olivier > wrote: > > +1 on that > > > > On Wed, Sep 27, 2017 at 11:15 PM Gautam wrote: > > > >> Hi Chris, > >> > >> Here is > >> user > >> document on semantics of protected branch. > >> In short when a branch is protected following applies to that branch. > >> > >> - Can't be force pushed > >> - Can't be deleted > >> - Can't have changes merged into it until required status checks > >> pas= s > >> - Can't have changes merged into it until required reviews are > approved > >> < > >> https://help.github.com/articles/approving-a-pull- > request-with-required-reviews > >> > > >> - Can't be edited or have files uploaded to it from the web > >> - Can't have changes merged into it until changes to files that > >> have a designated > >> code owner hav= e > >> been approved by that owner > >> > >> I am sure many of us might not want to have all these but we can > debate on > >> it. My main motive was to "*Can't have changes merged into it until > >> required status checks pass*" > >> > >> > >> -Gautam > >> > >> > >> > >> On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier > >> wrote: > >> > >> > What does that mean? "Protected"? Protected from what? > >> > > >> > On Wed, Sep 27, 2017 at 11:08 PM Gautam wrote= : > >> > > >> > > Hi Chris, > >> > > > >> > > I mean make "master branch protected" of MXNet. > >> > > > >> > > -Gautam > >> > > > >> > > On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier < > cjolivier01@gmail.com > >> > > >> > > wrote: > >> > > > >> > > > What does this mean? "Mx-net branch protected"? > >> > > > > >> > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA < > >> > ozawa.tsuyoshi@gmail.com > >> > > > > >> > > > wrote: > >> > > > > >> > > > > +1, > >> > > > > > >> > > > > While I'm checking the recent build failures, and I think the > >> > decision > >> > > > > of making the mx-net branch protected is necessary for stable > >> > > > > building. > >> > > > > Thanks Kumar for resuming important discussion. > >> > > > > > >> > > > > Best regards > >> > > > > - Tsuyoshi > >> > > > > > >> > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam < > gauta@amazon.com> > >> > > > wrote: > >> > > > > > Reviving the discussion. > >> > > > > > > >> > > > > > At this point of time we have couple of stable builds > >> > > > > > > >> > > > > https://builds.apache.org/view/Incubator%20Projects/job/ > >> > > > incubator-mxnet/job/master/448/ > >> > > > > > > >> > > > > https://builds.apache.org/view/Incubator%20Projects/job/ > >> > > > incubator-mxnet/job/master/449/ > >> > > > > > > >> > > > > > Should we have a quick discussion or polling on making the > mx-net > >> > > > branch > >> > > > > protected? If you still think we shouldn=E2=80=99t make it pro= tected > please > >> > > > provide > >> > > > > a reason to support your claim. > >> > > > > > > >> > > > > > Few of us have concern over Jenkin=E2=80=99s stability. If I= look two > >> weeks > >> > > > > back, after upgrading Linux slave to g2.8x and new windows AMI= , > we > >> > have > >> > > > not > >> > > > > seen any case where instance died due to high memory usage or > any > >> > > process > >> > > > > got killed due to high cpu usage or any other issue with windo= ws > >> > > slaves. > >> > > > > > > >> > > > > > Going forward we are also planning that if we add any new > slave > >> we > >> > > will > >> > > > > not enable the main load immediately, but rather will do =E2= =80=98test > >> build=E2=80=99 > >> > > to > >> > > > > make sure that new slaves are not causing any infrastructure > issue > >> > and > >> > > > > capable to perform as good as existing slaves. > >> > > > > > > >> > > > > > -Gautam > >> > > > > > > >> > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" > wrote: > >> > > > > > > >> > > > > > @madan looking into some failures =E2=80=93 you=E2=80=99= re right=E2=80=A6 there=E2=80=99s > >> > > multiple > >> > > > > issues going on, some of them intermittent, and we want to be > able > >> to > >> > > > merge > >> > > > > fixes in. > >> > > > > > Agreed that we can wait with setting up protected mode > until > >> > > build > >> > > > > stabilizes. > >> > > > > > > >> > > > > > On 8/31/17, 11:41, "Madan Jampani" < > madan.jampani@gmail.com> > >> > > > wrote: > >> > > > > > > >> > > > > > @hagay: we agree on the end state. I'm not too > particular > >> > > about > >> > > > > how we get > >> > > > > > there. If you think enabling it now and fixes > regression > >> > > later > >> > > > > is doable, > >> > > > > > I'm fine with. I see a bit of a chicken and egg > problem. > >> We > >> > > > need > >> > > > > to get > >> > > > > > some fixes in even when the status checks are failin= g. > >> > > > > > > >> > > > > > On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay < > >> > > > > lupesko@gmail.com> wrote: > >> > > > > > > >> > > > > > > @madan =E2=80=93 re: getting to a stable CI first: > >> > > > > > > I=E2=80=99m concerned that by not enabling protect= ed branch > >> mode > >> > > > ASAP, > >> > > > > we=E2=80=99re just > >> > > > > > > taking in more regressions, which makes a stable > build > >> a > >> > > > > moving target for > >> > > > > > > us=E2=80=A6 > >> > > > > > > > >> > > > > > > On 8/31/17, 10:49, "Zha, Sheng" < > zhasheng@amazon.com> > >> > > wrote: > >> > > > > > > > >> > > > > > > Just one thing: please don=E2=80=99t disable m= ore tests > or > >> > just > >> > > > > raise the > >> > > > > > > tolerance thresholds. > >> > > > > > > > >> > > > > > > Best regards, > >> > > > > > > -sz > >> > > > > > > > >> > > > > > > On 8/31/17, 10:45 AM, "Madan Jampani" < > >> > > > > madan.jampani@gmail.com> wrote: > >> > > > > > > > >> > > > > > > +1 > >> > > > > > > Before we can turn protected mode I feel w= e > >> > should > >> > > > > first get to a > >> > > > > > > stable CI > >> > > > > > > pipeline. > >> > > > > > > Sandeep is chasing down known breaking > issues. > >> > > > > > > > >> > > > > > > > >> > > > > > > On Thu, Aug 31, 2017 at 10:27 AM, Hagay > >> Lupesko < > >> > > > > lupesko@gmail.com> > >> > > > > > > wrote: > >> > > > > > > > >> > > > > > > > Build stability is a major issue, builds > have > >> > > been > >> > > > > failing left > >> > > > > > > and right > >> > > > > > > > over the last week. Some of it is due to > >> > Jenkins > >> > > > > slave issues, > >> > > > > > > but some are > >> > > > > > > > real regressions. > >> > > > > > > > We need to be more strict in the code > we're > >> > > > > committing. > >> > > > > > > > > >> > > > > > > > I propose we configure our master to be = a > >> > > protected > >> > > > > branch ( > >> > > > > > > > > >> > > > > https://help.github.com/articles/about-protected-branches/). > >> > > > > > > > > >> > > > > > > > Thoughts? > >> > > > > > > > > >> > > > > > > > On 2017-08-28 22:41, sandeep > krishnamurthy < > >> > > > > s...@gmail.com> > >> > > > > > > wrote: > >> > > > > > > > > Hello Committers and Contributors,> > >> > > > > > > > > > >> > > > > > > > > Due to unstable build pipelines, from > past > >> 1 > >> > > > week, > >> > > > > PRs are > >> > > > > > > being merged> > >> > > > > > > > > after CR ignoring PR build status. Bui= ld > >> > > pipeline > >> > > > > is much more > >> > > > > > > stable > >> > > > > > > > than> > >> > > > > > > > > last week and most of the build failur= es > >> you > >> > > see > >> > > > > from now on, > >> > > > > > > are likely > >> > > > > > > > to> > >> > > > > > > > > be a valid failure and hence, it is > >> > recommended > >> > > > to > >> > > > > wait for PR > >> > > > > > > builds, > >> > > > > > > > see> > >> > > > > > > > > the root cause of any build failures > before > >> > > > > proceeding with > >> > > > > > > merges.> > >> > > > > > > > > > >> > > > > > > > > At this point of time, there are 2 > >> > intermittent > >> > > > > issue yet to > >> > > > > > > be fixed -> > >> > > > > > > > > * Network error leading to GitHub > requests > >> > > > > throwing 404> > >> > > > > > > > > * A conflict in artifacts generated > between > >> > > > > branches/PR - > >> > > > > > > Cause unknown > >> > > > > > > > yet.> > >> > > > > > > > > These issues will be fixed soon.> > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > -- > > >> > > > > > > > > Sandeep Krishnamurthy> > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > -- > >> > > > > - Tsuyoshi > >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > > -- > >> > > Best Regards, > >> > > Gautam Kumar > >> > > > >> > > >> > >> > >> > >> -- > >> Best Regards, > >> Gautam Kumar > >> > --94eb2c1cda1e362def055a404932--