From dev-return-4350-archive-asf-public=cust-asf.ponee.io@mxnet.incubator.apache.org Tue Oct 2 03:15:42 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 961AD180660 for ; Tue, 2 Oct 2018 03:15:41 +0200 (CEST) Received: (qmail 77021 invoked by uid 500); 2 Oct 2018 01:15:40 -0000 Mailing-List: contact dev-help@mxnet.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mxnet.incubator.apache.org Delivered-To: mailing list dev@mxnet.incubator.apache.org Received: (qmail 77006 invoked by uid 99); 2 Oct 2018 01:15:39 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Oct 2018 01:15:39 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 782DC1A1FF4 for ; Tue, 2 Oct 2018 01:15:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.889 X-Spam-Level: * X-Spam-Status: No, score=1.889 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id daP5VNCU-Z-e for ; Tue, 2 Oct 2018 01:15:36 +0000 (UTC) Received: from mail-yw1-f42.google.com (mail-yw1-f42.google.com [209.85.161.42]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 548535F33E for ; Tue, 2 Oct 2018 01:15:35 +0000 (UTC) Received: by mail-yw1-f42.google.com with SMTP id m129-v6so134330ywc.1 for ; Mon, 01 Oct 2018 18:15:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=YMoqnF1jb505oRH7VqW9M3qnHb/vrJKYGhB7jCVGaV8=; b=Ct8d6l64ulpoBiuSBjx2bHNGKTldP3ZQUrJd0qiu/VUXaX5TWOQ0o0zEsqtmJjCmZG qg2WOJMWs0OwprJDSIs/MaHE3O+Z/6zLNyeMu4xMxPbXyW/GQJyEpzrWkqNujyE6w1d6 K4qMSGt3LnbO6sTOy1jHurww9Ct4mBsf6vWuzSC+30T2fxyQcV6JMsiAA+ckdGhfrnum AIC/HS57jDA7yOlGKX8UuUw0bhTcxoMPFYSscIpx8p5kdbmg4WJwm5Qsrwfv9FAI6oyh YtUYPtMsb0g89fvXcdE358l4xbDDkd2bX+bWwWODiMxZpz+tvI/dGimowpD4nJgpG6AB 7IqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=YMoqnF1jb505oRH7VqW9M3qnHb/vrJKYGhB7jCVGaV8=; b=r6xUXdsRmXA6FAeoR+Bdey3hjrPk42QCoU0ok3MkSpoWuVQmvpV5GF9NWlTkcPwwP4 Zt4wsePwomk9n6d9kmhd2OVXcMHchjQTrWK4SPToNxHD6RU139V7NI6WFGAwu8iGt8HF pIYR+MoGkSDsbQ+g4r5eO0EXuDLkmjxxKHeBcnU0T/MOjK4R2Oa41cdQ+m23+mSjV4HH x4nFBQmhaT9H9kOJ8rQ3migPekfSI180eNRmvDARuEiQ428PMpdhRPYtiddI3rTyHlEO c5Ka62+faZxdk2QAl5eNooYygzPhe2I75JBP+x8b9TJJMwSHLiybVe12V/cI4aelAo1d ImRw== X-Gm-Message-State: ABuFfoiXOO9BSLp5TC2oV0e9FviAl7UYHMzqBh1gCAVUHMJ8j30uHe8A +hrlkHV9eyFHeJKU5gWBVwApyVSFz7wej6n7XDW4+9Z2 X-Google-Smtp-Source: ACcGV62rhqClpM7huX/RLOZwFDbvWJmhaiXZD03GLtg0WbuBfFbFxcDpaW2+UQsI1nU3SslHQMnSNLuIUjnaecHLwiM= X-Received: by 2002:a81:2a42:: with SMTP id q63-v6mr6850399ywq.91.1538442927296; Mon, 01 Oct 2018 18:15:27 -0700 (PDT) MIME-Version: 1.0 References: <79A6682D7EF10E4A8099CFEBBA7D5B549F862A@SHSMSX101.ccr.corp.intel.com> <79A6682D7EF10E4A8099CFEBBA7D5B549F8770@SHSMSX101.ccr.corp.intel.com> <79A6682D7EF10E4A8099CFEBBA7D5B549F89B2@SHSMSX101.ccr.corp.intel.com> In-Reply-To: From: kellen sunderland Date: Tue, 2 Oct 2018 03:15:15 +0200 Message-ID: Subject: Re: Time out for Travis CI To: dev@mxnet.incubator.apache.org Content-Type: multipart/alternative; boundary="00000000000030568a057734a854" --00000000000030568a057734a854 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I actually thought we were already using a paid plan through Apache https://blogs.apache.org/infra/entry/apache_gains_additional_travis_ci On Tue, Oct 2, 2018, 3:11 AM Qing Lan wrote: > Are we currently on a free plan? If we are, probably the unlimited build > minutes would help > > Thanks, > Qing > > =EF=BB=BFOn 10/1/18, 6:08 PM, "kellen sunderland" > wrote: > > Does the global time out change for paid plans? I looked into it > briefly > but didn't see anything that would indicate it does. > > On Tue, Oct 2, 2018, 2:25 AM Pedro Larroy < > pedro.larroy.lists@gmail.com> > wrote: > > > I think there's two approaches that we can take to mitigate the > build & > > test time problem, in one hand use a paid travis CI plan, in other > improve > > the unit tests in suites and only run a core set of tests, as we > should do > > on devices, but on this case we reduce coverage. > > > > https://travis-ci.com/plans > > > > Pedro. > > > > On Sat, Sep 29, 2018 at 6:53 PM YiZhi Liu > wrote: > > > > > This makes sense. Thanks > > > > > > On Sat, Sep 29, 2018 at 6:36 PM kellen sunderland < > > > kellen.sunderland@gmail.com> wrote: > > > > > > > Hey Zhennan, yes this is the exact problem, and I agree with yo= ur > > points > > > > completely. This is why when we first added Travis we attempte= d > to > > > > communicate that it would be informational only, and that we'd > need to > > > > iterate on the config before it would be a test that people > should > > > consider > > > > 'required'. Apologies, we should have been more straightforwar= d > about > > > > those tradeoffs. The strong point in favour of adding Travis i= n > > > > informational mode was that we had a serious MacOS specific bug > that we > > > > wanted to verify was fixed. > > > > > > > > The good news is I've opened a PR which I hope will speed up > these > > builds > > > > to the point that they won't rely on caching. Once it is merge= d > it > > would > > > > be very helpful if you could rebase on this PR and test to > ensure that > > > > large changes no longer hit the global timeout without cache. > > > > https://github.com/apache/incubator-mxnet/pull/12706 > > > > > > > > On Sun, Sep 30, 2018 at 2:48 AM Qin, Zhennan < > zhennan.qin@intel.com> > > > > wrote: > > > > > > > > > Hi YiZhi and Kellen, > > > > > > > > > > From my point of view, travis should be able to get passed > from a > > > scratch > > > > > build. Pending result on ccache hit/miss is not a good idea. > For this > > > PR, > > > > > as it changed many header file, lots of files need be > recompiled, > > just > > > > like > > > > > a scratch build. I think that's the reason that travis > timeout. This > > > > should > > > > > be fixed before enabling travis, as it will block any change > to those > > > > base > > > > > header file. Again, it's not a special case with this PR only= , > you > > can > > > > find > > > > > same problem on other PRs: > > > > > > > > > > > > > > > > > > > > > > > > > https://travis-ci.org/apache/incubator-mxnet/builds/433172088?utm_source= =3Dgithub_status&utm_medium=3Dnotification > > > > > > > > > > > > > > > > > > > > https://travis-ci.org/apache/incubator-mxnet/builds/434404305?utm_source= =3Dgithub_status&utm_medium=3Dnotification > > > > > > > > > > > > > > > Thanks, > > > > > Zhennan > > > > > > > > > > -----Original Message----- > > > > > From: YiZhi Liu [mailto:eazhi.liu@gmail.com] > > > > > Sent: Sunday, September 30, 2018 5:15 AM > > > > > To: eazhi.liu@gmail.com > > > > > Cc: dev@mxnet.incubator.apache.org > > > > > Subject: Re: Time out for Travis CI > > > > > > > > > > while other PRs are all good. > > > > > On Sat, Sep 29, 2018 at 2:13 PM YiZhi Liu > > > wrote: > > > > > > > > > > > > Honestly I don't know yet. I can help to investigate. Just > given > > the > > > > > > evidence that, travis timeout every time it gets > re-triggered - 2 > > > > > > times at least. Correct me if I'm wrong @ Zhennan On Sat, > Sep 29, > > > 2018 > > > > > > at 1:54 PM kellen sunderland > wrote: > > > > > > > > > > > > > > Reading over the PR I don't see what aspects would cause > extra > > > > > > > runtime YiZhi, could you point them out? > > > > > > > > > > > > > > On Sat, Sep 29, 2018 at 8:46 PM YiZhi Liu < > eazhi.liu@gmail.com> > > > > wrote: > > > > > > > > > > > > > > > Kellen, I think this PR introduces extra runtime in CI, > thus > > > > > > > > causes the timeout. Which means, once merged, every PR > later > > will > > > > > > > > see same timeout in travis. > > > > > > > > > > > > > > > > So shall we modify the changes to decrease the test > running > > time? > > > > > > > > or just disable the Travis CI? > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Sep 28, 2018 at 9:17 PM Qin, Zhennan > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Hi Kellen, > > > > > > > > > > > > > > > > > > Thanks for your explanation. Do you have a time plan > to solve > > > > > > > > > the > > > > > > > > timeout issue? Rebasing can't work for my case. Or shal= l > we run > > > it > > > > > > > > silently to disallow it voting X for overall CI result? > Because > > > > > > > > most developers are used to ignore the PRs with 'X'. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Zhennan > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > From: kellen sunderland [mailto: > kellen.sunderland@gmail.com] > > > > > > > > > Sent: Friday, September 28, 2018 10:38 PM > > > > > > > > > To: dev@mxnet.incubator.apache.org > > > > > > > > > Subject: Re: Time out for Travis CI > > > > > > > > > > > > > > > > > > Hey Zhennan, you're safe to ignore Travis failures fo= r > now. > > > > > > > > > They're > > > > > > > > just informational. > > > > > > > > > > > > > > > > > > The reason you sometimes see quick builds and > sometimes see > > > slow > > > > > > > > > builds > > > > > > > > is that we're making use of ccache in between builds. > If your > > PR > > > > > > > > is similar to what's in master you should build very > quickly, > > if > > > > > > > > not it's going to take a while and likely time out. If > you see > > > > > > > > timeouts rebasing may speed things up. Unfortunately t= he > > > timeouts > > > > > > > > are global and we're not able to increase them. I'm > hoping > > that > > > > > > > > adding artifact caching will speed up future builds to > the > > point > > > > > > > > that test runs and builds can be executed in under the > global > > > limit > > > > > (which is ~50 minutes). > > > > > > > > > > > > > > > > > > -Kellen > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Sep 28, 2018 at 4:05 PM Qin, Zhennan > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi MXNet devs, > > > > > > > > > > > > > > > > > > > > I'm struggled with new Travis CI for a while, it > always run > > > > > > > > > > time out for this PR: > > > > > > > > > > https://github.com/apache/incubator-mxnet/pull/1253= 0 > > > > > > > > > > > > > > > > > > > > Most of the time, Jenkins CI can pass, while Travis > can't > > be > > > > > > > > > > finished within 50 minutes. For this PR, it shouldn= 't > > affect > > > > > > > > > > much on the build time or unit test time. Also, I > saw other > > > PR > > > > > has same problem, eg. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://travis-ci.org/apache/incubator-mxnet/builds/433172088? > > > > > > > > > > utm_sour ce=3Dgithub_status&utm_medium=3Dnotificati= on > > > > > > > > > > > > > > > > > > > > > > > https://travis-ci.org/apache/incubator-mxnet/builds/434404305? > > > > > > > > > > utm_sour ce=3Dgithub_status&utm_medium=3Dnotificati= on > > > > > > > > > > > > > > > > > > > > According to the time stamp from Travis, all passed > PR are > > > > > > > > > > within small code change, and can complete `make -j= 2` > > within > > > > > > > > > > 25s. But for timeout case, 'make -j2' will need abo= ut > > 1600s. > > > > > > > > > > Does Travis do incremental build for each test? > Shall we > > > > > > > > > > increase time limit for large PR? Can we add more > time > > stamp > > > > > > > > > > for build and unites stage to > > > > > > > > help understand what's going on there? > > > > > > > > > > > > > > > > > > > > Thanks in advance, > > > > > > > > > > Zhennan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Yizhi Liu > > > > > > > > DMLC member > > > > > > > > Amazon Web Services > > > > > > > > Vancouver, Canada > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Yizhi Liu > > > > > > DMLC member > > > > > > Amazon Web Services > > > > > > Vancouver, Canada > > > > > > > > > > > > > > > > > > > > -- > > > > > Yizhi Liu > > > > > DMLC member > > > > > Amazon Web Services > > > > > Vancouver, Canada > > > > > > > > > > > > -- > > > Yizhi Liu > > > DMLC member > > > Amazon Web Services > > > Vancouver, Canada > > > > > > > > --00000000000030568a057734a854--