From dev-return-7695-archive-asf-public=cust-asf.ponee.io@mxnet.incubator.apache.org Tue Jun 16 15:20:07 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 1D8EA180621 for ; Tue, 16 Jun 2020 17:20:07 +0200 (CEST) Received: (qmail 98893 invoked by uid 500); 16 Jun 2020 15:20:06 -0000 Mailing-List: contact dev-help@mxnet.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mxnet.incubator.apache.org Delivered-To: mailing list dev@mxnet.incubator.apache.org Received: (qmail 98851 invoked by uid 99); 16 Jun 2020 15:20:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Jun 2020 15:20:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 9D3B01A4361 for ; Tue, 16 Jun 2020 15:20:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -12.791 X-Spam-Level: X-Spam-Status: No, score=-12.791 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.2, KAM_DMARC_STATUS=0.01, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5] autolearn=disabled Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 92-GKROknZ33 for ; Tue, 16 Jun 2020 15:20:03 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=207.244.88.153; helo=mail.apache.org; envelope-from=marcoabreu@apache.org; receiver= Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with SMTP id 90B6ABB8F9 for ; Tue, 16 Jun 2020 15:20:03 +0000 (UTC) Received: (qmail 29276 invoked by uid 99); 15 Jun 2020 23:20:03 -0000 Received: from Unknown (HELO mailrelay1-lw-us.apache.org) (10.10.3.159) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Jun 2020 23:20:03 +0000 Received: from mail-qv1-f42.google.com (mail-qv1-f42.google.com [209.85.219.42]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 74C98400DA for ; Mon, 15 Jun 2020 23:20:03 +0000 (UTC) Received: by mail-qv1-f42.google.com with SMTP id fc4so8628593qvb.1 for ; Mon, 15 Jun 2020 16:20:03 -0700 (PDT) X-Gm-Message-State: AOAM532xvzX6LVCKAYYwpH+kB1GDp6NiNylTEHoAhYPO3UMQHZ8U2YfW eALrMmG6cnOxt1wvonqP8EeUWCD4wAnzkZ3/JYY= X-Google-Smtp-Source: ABdhPJyPFlzacG+gw9L2sWIwbloA6heG/c2o7VqHXIraAlZvEVrJJyZUmBppQRGgYr8eXQs+hmbZSKz/DlDGLbvYP0M= X-Received: by 2002:a0c:f84c:: with SMTP id g12mr116473qvo.31.1592263203181; Mon, 15 Jun 2020 16:20:03 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Marco de Abreu Date: Tue, 16 Jun 2020 01:19:51 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [CI] Staggered build pipelines enabled To: dev@mxnet.incubator.apache.org Content-Type: multipart/alternative; boundary="0000000000009d581205a827aa6e" --0000000000009d581205a827aa6e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello, I'd like to revisit this decision and review whether the expected benefit (cost reduction) was achieved and how the overall PR validation duration has changed. Could you guys share some information on this matter? Just today, we've had two incidents which were caused by this change: 1. This PR was merged prematurely because the follow up pipelines didn't run (whether it's a timing issue or something else I don't know). We've always had the policy of not enforcing protected feature branches and this pipeline is causing human errors. https://github.com/apache/incubator-mxnet/pull/18560 2. The sanity check, which got into the critical path here, starts running into timeouts. I'm aware that this is in case of cache misses, but none the less does that heavily increase the waiting duration since developers now have to wait for two entire cache build cycles instead of just one: https://github.com/apache/incubator-mxnet/pull/18568 I understand that money has to be conserved, but I still stand by my opinion that this was the wrong move and development speed was sacrificed. If there are no other compelling arguments, I'd prefer if the previous state of parallel pipelines could be restored. Best regards Marco sandeep krishnamurthy schrieb am Di., 28. Apr. 2020, 07:55: > Thanks a lot Joe for your contributions. Thank you Marco, Chai and Leo fo= r > helping this. > Especially given that you had seen around 57% build failing in sanity > check, this should be very helpful to provide faster feedback for PR > authors on sanity issues plus save a lot of unnecessary builds. > > Best, > Sandeep > > On Mon, 27 Apr 2020, 10:20 pm Joe Evans, wrote: > > > Hi dev community, > > > > > > We have made the changes to the mxnet CI system to incorporate the > > staggered build pipelines. With this change, when a new PR is created o= r > an > > existing PR is updated, the status checks will only show > > =E2=80=9Cci/jenkins/mxnet-validation/sanity=E2=80=9D build job at first= . Once this build > > completes successfully (avg. run time is about 10min), the remaining CI > > build jobs will appear and function as previously. > > > > > > Please let me know if you experience any issues with this change. > > > > > > Thanks! > > > > Joe > > > > > > References: > > > > > > https://github.com/apache/incubator-mxnet/issues/17802 > > > --0000000000009d581205a827aa6e--