mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qing Lan <lanking...@live.com>
Subject Re: Time out for Travis CI
Date Tue, 02 Oct 2018 01:24:34 GMT
From the link it looks like "Travis CI offers a free account" instead of Apache buy it. It
may just be a free user account with extension on the numbers of nodes it can runs on. I think
we may need to reach out to Travis or Apache to clarify that we currently have the service
that paid version have instead of an extension of "free user account".

Thanks,
Qing

´╗┐On 10/1/18, 6:15 PM, "kellen sunderland" <kellen.sunderland@gmail.com> wrote:

    I actually thought we were already using a paid plan through Apache
    https://blogs.apache.org/infra/entry/apache_gains_additional_travis_ci
    
    On Tue, Oct 2, 2018, 3:11 AM Qing Lan <lanking520@live.com> wrote:
    
    > Are we currently on a free plan? If we are, probably the unlimited build
    > minutes would help
    >
    > Thanks,
    > Qing
    >
    > On 10/1/18, 6:08 PM, "kellen sunderland" <kellen.sunderland@gmail.com>
    > wrote:
    >
    >     Does the global time out change for paid plans?  I looked into it
    > briefly
    >     but didn't see anything that would indicate it does.
    >
    >     On Tue, Oct 2, 2018, 2:25 AM Pedro Larroy <
    > pedro.larroy.lists@gmail.com>
    >     wrote:
    >
    >     > I think there's two approaches that we can take to mitigate the
    > build &
    >     > test time problem, in one hand use a paid travis CI plan, in other
    > improve
    >     > the unit tests in suites and only run a core set of tests, as we
    > should do
    >     > on devices, but on this case we reduce coverage.
    >     >
    >     > https://travis-ci.com/plans
    >     >
    >     > Pedro.
    >     >
    >     > On Sat, Sep 29, 2018 at 6:53 PM YiZhi Liu <eazhi.liu@gmail.com>
    > wrote:
    >     >
    >     > > This makes sense. Thanks
    >     > >
    >     > > On Sat, Sep 29, 2018 at 6:36 PM kellen sunderland <
    >     > > kellen.sunderland@gmail.com> wrote:
    >     > >
    >     > > > Hey Zhennan, yes this is the exact problem, and I agree with your
    >     > points
    >     > > > completely.  This is why when we first added Travis we attempted
    > to
    >     > > > communicate that it would be informational only, and that we'd
    > need to
    >     > > > iterate on the config before it would be a test that people
    > should
    >     > > consider
    >     > > > 'required'.  Apologies, we should have been more straightforward
    > about
    >     > > > those tradeoffs.  The strong point in favour of adding Travis
in
    >     > > > informational mode was that we had a serious MacOS specific bug
    > that we
    >     > > > wanted to verify was fixed.
    >     > > >
    >     > > > The good news is I've opened a PR which I hope will speed up
    > these
    >     > builds
    >     > > > to the point that they won't rely on caching.  Once it is merged
    > it
    >     > would
    >     > > > be very helpful if you could rebase on this PR and test to
    > ensure that
    >     > > > large changes no longer hit the global timeout without cache.
    >     > > > https://github.com/apache/incubator-mxnet/pull/12706
    >     > > >
    >     > > > On Sun, Sep 30, 2018 at 2:48 AM Qin, Zhennan <
    > zhennan.qin@intel.com>
    >     > > > wrote:
    >     > > >
    >     > > > > Hi YiZhi and Kellen,
    >     > > > >
    >     > > > > From my point of view, travis should be able to get passed
    > from a
    >     > > scratch
    >     > > > > build. Pending result on ccache hit/miss is not a good idea.
    > For this
    >     > > PR,
    >     > > > > as it changed many header file, lots of files need be
    > recompiled,
    >     > just
    >     > > > like
    >     > > > > a scratch build. I think that's the reason that travis
    > timeout. This
    >     > > > should
    >     > > > > be fixed before enabling travis, as it will block any change
    > to those
    >     > > > base
    >     > > > > header file. Again, it's not a special case with this PR
only,
    > you
    >     > can
    >     > > > find
    >     > > > > same problem on other PRs:
    >     > > > >
    >     > > > >
    >     > > > >
    >     > > >
    >     > >
    >     >
    > https://travis-ci.org/apache/incubator-mxnet/builds/433172088?utm_source=github_status&utm_medium=notification
    >     > > > >
    >     > > > >
    >     > > >
    >     > >
    >     >
    > https://travis-ci.org/apache/incubator-mxnet/builds/434404305?utm_source=github_status&utm_medium=notification
    >     > > > >
    >     > > > >
    >     > > > > Thanks,
    >     > > > > Zhennan
    >     > > > >
    >     > > > > -----Original Message-----
    >     > > > > From: YiZhi Liu [mailto:eazhi.liu@gmail.com]
    >     > > > > Sent: Sunday, September 30, 2018 5:15 AM
    >     > > > > To: eazhi.liu@gmail.com
    >     > > > > Cc: dev@mxnet.incubator.apache.org
    >     > > > > Subject: Re: Time out for Travis CI
    >     > > > >
    >     > > > > while other PRs are all good.
    >     > > > > On Sat, Sep 29, 2018 at 2:13 PM YiZhi Liu <eazhi.liu@gmail.com
    > >
    >     > wrote:
    >     > > > > >
    >     > > > > > Honestly I don't know yet. I can help to investigate.
Just
    > given
    >     > the
    >     > > > > > evidence that, travis timeout every time it gets
    > re-triggered - 2
    >     > > > > > times at least. Correct me if I'm wrong @ Zhennan On
Sat,
    > Sep 29,
    >     > > 2018
    >     > > > > > at 1:54 PM kellen sunderland <kellen.sunderland@gmail.com>
    > wrote:
    >     > > > > > >
    >     > > > > > > Reading over the PR I don't see what aspects would
cause
    > extra
    >     > > > > > > runtime YiZhi, could you point them out?
    >     > > > > > >
    >     > > > > > > On Sat, Sep 29, 2018 at 8:46 PM YiZhi Liu <
    > eazhi.liu@gmail.com>
    >     > > > wrote:
    >     > > > > > >
    >     > > > > > > > Kellen, I think this PR introduces extra runtime
in CI,
    > thus
    >     > > > > > > > causes the timeout. Which means, once merged,
every PR
    > later
    >     > will
    >     > > > > > > > see same timeout in travis.
    >     > > > > > > >
    >     > > > > > > > So shall we modify the changes to decrease
the test
    > running
    >     > time?
    >     > > > > > > > or just disable the Travis CI?
    >     > > > > > > >
    >     > > > > > > >
    >     > > > > > > > On Fri, Sep 28, 2018 at 9:17 PM Qin, Zhennan
    >     > > > > > > > <zhennan.qin@intel.com>
    >     > > > > > > > wrote:
    >     > > > > > > > >
    >     > > > > > > > > Hi Kellen,
    >     > > > > > > > >
    >     > > > > > > > > Thanks for your explanation. Do you have
a time plan
    > to solve
    >     > > > > > > > > the
    >     > > > > > > > timeout issue? Rebasing can't work for my
case. Or shall
    > we run
    >     > > it
    >     > > > > > > > silently to disallow it voting X for overall
CI result?
    > Because
    >     > > > > > > > most developers are used to ignore the PRs
with 'X'.
    >     > > > > > > > >
    >     > > > > > > > > Thanks,
    >     > > > > > > > > Zhennan
    >     > > > > > > > >
    >     > > > > > > > > -----Original Message-----
    >     > > > > > > > > From: kellen sunderland [mailto:
    > kellen.sunderland@gmail.com]
    >     > > > > > > > > Sent: Friday, September 28, 2018 10:38
PM
    >     > > > > > > > > To: dev@mxnet.incubator.apache.org
    >     > > > > > > > > Subject: Re: Time out for Travis CI
    >     > > > > > > > >
    >     > > > > > > > > Hey Zhennan, you're safe to ignore Travis
failures for
    > now.
    >     > > > > > > > > They're
    >     > > > > > > > just informational.
    >     > > > > > > > >
    >     > > > > > > > > The reason you sometimes see quick builds
and
    > sometimes see
    >     > > slow
    >     > > > > > > > > builds
    >     > > > > > > > is that we're making use of ccache in between
builds.
    > If your
    >     > PR
    >     > > > > > > > is similar to what's in master you should
build very
    > quickly,
    >     > if
    >     > > > > > > > not it's going to take a while and likely
time out.  If
    > you see
    >     > > > > > > > timeouts rebasing may speed things up.  Unfortunately
the
    >     > > timeouts
    >     > > > > > > > are global and we're not able to increase
them.  I'm
    > hoping
    >     > that
    >     > > > > > > > adding artifact caching will speed up future
builds to
    > the
    >     > point
    >     > > > > > > > that test runs and builds can be executed
in under the
    > global
    >     > > limit
    >     > > > > (which is ~50 minutes).
    >     > > > > > > > >
    >     > > > > > > > > -Kellen
    >     > > > > > > > >
    >     > > > > > > > >
    >     > > > > > > > > On Fri, Sep 28, 2018 at 4:05 PM Qin,
Zhennan
    >     > > > > > > > > <zhennan.qin@intel.com>
    >     > > > > > > > wrote:
    >     > > > > > > > >
    >     > > > > > > > > > Hi MXNet devs,
    >     > > > > > > > > >
    >     > > > > > > > > > I'm struggled with new Travis CI
for a while, it
    > always run
    >     > > > > > > > > > time out for this PR:
    >     > > > > > > > > > https://github.com/apache/incubator-mxnet/pull/12530
    >     > > > > > > > > >
    >     > > > > > > > > > Most of the time, Jenkins CI can
pass, while Travis
    > can't
    >     > be
    >     > > > > > > > > > finished within 50 minutes. For
this PR, it shouldn't
    >     > affect
    >     > > > > > > > > > much on the build time or unit test
time. Also, I
    > saw other
    >     > > PR
    >     > > > > has same problem, eg.
    >     > > > > > > > > >
    >     > > > > > > > > >
    >     > > > > > > > > >
    >     > > https://travis-ci.org/apache/incubator-mxnet/builds/433172088?
    >     > > > > > > > > > utm_sour ce=github_status&utm_medium=notification
    >     > > > > > > > > >
    >     > > > > > > > > >
    >     > > https://travis-ci.org/apache/incubator-mxnet/builds/434404305?
    >     > > > > > > > > > utm_sour ce=github_status&utm_medium=notification
    >     > > > > > > > > >
    >     > > > > > > > > > According to the time stamp from
Travis, all passed
    > PR are
    >     > > > > > > > > > within small code change, and can
complete `make -j2`
    >     > within
    >     > > > > > > > > > 25s. But for timeout case, 'make
-j2' will need about
    >     > 1600s.
    >     > > > > > > > > > Does Travis do incremental build
for each test?
    > Shall we
    >     > > > > > > > > > increase time limit for large PR?
Can we add more
    > time
    >     > stamp
    >     > > > > > > > > > for build and unites stage to
    >     > > > > > > > help understand what's going on there?
    >     > > > > > > > > >
    >     > > > > > > > > > Thanks in advance,
    >     > > > > > > > > > Zhennan
    >     > > > > > > > > >
    >     > > > > > > >
    >     > > > > > > >
    >     > > > > > > >
    >     > > > > > > > --
    >     > > > > > > > Yizhi Liu
    >     > > > > > > > DMLC member
    >     > > > > > > > Amazon Web Services
    >     > > > > > > > Vancouver, Canada
    >     > > > > > > >
    >     > > > > >
    >     > > > > >
    >     > > > > >
    >     > > > > > --
    >     > > > > > Yizhi Liu
    >     > > > > > DMLC member
    >     > > > > > Amazon Web Services
    >     > > > > > Vancouver, Canada
    >     > > > >
    >     > > > >
    >     > > > >
    >     > > > > --
    >     > > > > Yizhi Liu
    >     > > > > DMLC member
    >     > > > > Amazon Web Services
    >     > > > > Vancouver, Canada
    >     > > > >
    >     > > >
    >     > > --
    >     > > Yizhi Liu
    >     > > DMLC member
    >     > > Amazon Web Services
    >     > > Vancouver, Canada
    >     > >
    >     >
    >
    >
    >
    

Mime
View raw message