mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lin Yuan <apefor...@gmail.com>
Subject Re: [DISCUSS] Build OSX builds in CI (possibly with TravisCI).
Date Wed, 05 Sep 2018 16:40:27 GMT
Hi Kellen,

Many thanks for your and Marco's effort! I think this is a very crucial
piece to improve MXNet stability.

To add some data points:
1) Customers using CoreML to MXNet converter were blocked for a while
because the converter was broken and no unit test was in place to detect
that.
2) Developers on Mac cannot verify their local commits because some unit
tests on master were broken. This wasted much time and resource on jenkins
server to detect the failure.
3) Please consider running the CI on Mac OS 10.13 since this is the minimum
Mac OS version that supports CoreML (to support CoreML to MXNet converter)

Best Regards,

Lin

On Wed, Sep 5, 2018, 3:02 AM kellen sunderland <kellen.sunderland@gmail.com>
wrote:

> I'm bumping this thread as we've recently had our first serious bug on
> MacOS that would have been caught by enabling Travis.
>
> I'm going to do a little experimental work together with Marco with the
> goal of enabling a minimal Travis build that will run python tests.  So far
> I've verified that Travis will in fact find a bug that currently exists in
> master and has been reproduced by MacOS clients.  This indicates to me that
> adding Travis will add value to our CI.
>
> My best guess is that it might take us some iteration before we find a
> scalable way to integrate Travis.  Given this we're going to enable Travis
> in non-blocking mode (i.e. failures are safe to ignore for the time being).
>
> To help mitigate the risk of timeouts, and to remove legacy code I'm going
> to re-create the travis.yml file from scratch.  I think it'll be much less
> confusing if we only have working code related to Travis in our codebase,
> so that contributors won't have to experiment to see what is or isn't
> working.  We've got some great, but slightly out-of-date functionality in
> the legacy .travis.yml file.  I hope we can work together to update the
> legacy features, ensure they work with the current folder structure and
> also make sure the features run within Travis's 45 minute global time
> window.
>
> I'd also like to set expectations that this is strictly a volunteer
> effort.  I'd welcome help from the community for support and maintenance.
> The model downloading caching work particularly stands out to me as
> something I'd like to re-enable again as soon as possible.
>
> -Kellen
>
> On Tue, Jan 9, 2018 at 11:52 AM Marco de Abreu <
> marco.g.abreu@googlemail.com>
> wrote:
>
> > Looks good! +1
> >
> > On Tue, Jan 9, 2018 at 10:24 AM, kellen sunderland <
> > kellen.sunderland@gmail.com> wrote:
> >
> > > I think most were in favour of at a minimum creating a clang build so
> > I've
> > > created a PR
> > > https://github.com/apache/incubator-mxnet/pull/9330/commits/
> > > 84089ea14123ebe4d66cc92e82a2d529cfbd8b19.
> > > My hope is this will catch many of the issues blocking OSX builds.  In
> > fact
> > > it already caught one issue.  If you guys are in favour I can remove
> the
> > > WIP and ask that it be merged.
> > >
> > > On Thu, Jan 4, 2018 at 6:29 PM, Chris Olivier <cjolivier01@gmail.com>
> > > wrote:
> > >
> > > > Nope, I have been on vacation.
> > > >
> > > > On Thu, Jan 4, 2018 at 9:10 AM, kellen sunderland <
> > > > kellen.sunderland@gmail.com> wrote:
> > > >
> > > > > Hope everyone had a good break.  Just wanted to check if there were
> > > > further
> > > > > thoughts on OSX builds.  Chris, did you have time to look into
> > > > virtualizing
> > > > > Mac OS?  Would it make sense for us to put something in place in
> the
> > > > > interim e.g. the clang solution?
> > > > >
> > > > > On Tue, Dec 12, 2017 at 7:59 PM, de Abreu, Marco <
> mabreu@amazon.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks for looking into this, Chris! No hurries on that one,
> we’ll
> > > look
> > > > > > into it next stage when we add new system- and
> build-configurations
> > > to
> > > > > the
> > > > > > CI.
> > > > > >
> > > > > > On 12.12.17, 19:12, "Chris Olivier" <cjolivier01@gmail.com>
> wrote:
> > > > > >
> > > > > >     I am on vacation starting Thursday.
> > > > > >
> > > > > >     On Tue, Dec 12, 2017 at 9:49 AM kellen sunderland <
> > > > > >     kellen.sunderland@gmail.com> wrote:
> > > > > >
> > > > > >     > Absolutely, let's do an investigation and see if it's
> > possible
> > > to
> > > > > >     > virtualize.  Would you have time to look into it a
bit
> > further?
> > > > > >     >
> > > > > >     > On Tue, Dec 12, 2017 at 6:47 PM, Chris Olivier <
> > > > > > cjolivier01@gmail.com>
> > > > > >     > wrote:
> > > > > >     >
> > > > > >     > > Don’t get me wrong, I’m not saying this Mac
OS Jenkins
> > > solution
> > > > > is
> > > > > > doable
> > > > > >     > > but I feel like we should investigate because
the payoff
> > > would
> > > > be
> > > > > > large.
> > > > > >     > >
> > > > > >     > >
> > > > > >     > > On Tue, Dec 12, 2017 at 9:38 AM Chris Olivier
<
> > > > > > cjolivier01@gmail.com>
> > > > > >     > > wrote:
> > > > > >     > >
> > > > > >     > > > Apple’s Darwin OS Is recently open-sourced.
> > > > > >     > > > https://github.com/PureDarwin/PureDarwin
> > > > > >     > > >
> > > > > >     > > > How to convert this into a non-GUI VM I am
not sure
> but I
> > > am
> > > > > > willing to
> > > > > >     > > > bet that people have done it already.
> > > > > >     > > >
> > > > > >     > > > On Tue, Dec 12, 2017 at 9:16 AM kellen sunderland
<
> > > > > >     > > > kellen.sunderland@gmail.com> wrote:
> > > > > >     > > >
> > > > > >     > > >> It might be technically possible, but
I think it would
> > > > violate
> > > > > > the
> > > > > >     > MacOS
> > > > > >     > > >> license: http://store.apple.com/
> > > > Catalog/US/Images/MacOSX.htm
> > > > > >     > > >>
> > > > > >     > > >> "2. Permitted License Uses and Restrictions.
> > > > > >     > > >> A. This License allows you to install
and use one copy
> > of
> > > > the
> > > > > > Apple
> > > > > >     > > >> Software on a single Apple-labeled computer
at a time.
> > > This
> > > > > > License
> > > > > >     > does
> > > > > >     > > >> not allow the Apple Software to exist
on more than one
> > > > > computer
> > > > > > at a
> > > > > >     > > >> time,and you may not make the Apple Software
available
> > > over
> > > > a
> > > > > > network
> > > > > >     > > >> where
> > > > > >     > > >> it could be used by multiple computers
at the same
> time.
> > > You
> > > > > > may make
> > > > > >     > > one
> > > > > >     > > >> copy of the Apple Software (excluding
the Boot ROM
> code)
> > > in
> > > > > >     > > >> machine-readable form for backup purposes
only;
> provided
> > > > that
> > > > > > the
> > > > > >     > backup
> > > > > >     > > >> copy must include all copyright or other
proprietary
> > > notices
> > > > > > contained
> > > > > >     > > on
> > > > > >     > > >> the original. "
> > > > > >     > > >>
> > > > > >     > > >> I could be wrong though, does anyone
know the details
> of
> > > > MacOS
> > > > > >     > > licensing /
> > > > > >     > > >> virtualization?
> > > > > >     > > >>
> > > > > >     > > >> On Tue, Dec 12, 2017 at 6:10 PM, Chris
Olivier <
> > > > > > cjolivier01@gmail.com
> > > > > >     > >
> > > > > >     > > >> wrote:
> > > > > >     > > >>
> > > > > >     > > >> > googling seems to be full of running
OSX (and even
> > > > > > open-sourced
> > > > > >     > > >> PureDarwin)
> > > > > >     > > >> > in VMs. One could conceivably run
a VM on an EC2
> > > instance,
> > > > > > right?
> > > > > >     > > >> >
> > > > > >     > > >> > On Tue, Dec 12, 2017 at 9:01 AM
kellen sunderland <
> > > > > >     > > >> > kellen.sunderland@gmail.com>
wrote:
> > > > > >     > > >> >
> > > > > >     > > >> > > It would be ideal if we could
cover OSX in
> Jenkins,
> > > but
> > > > > the
> > > > > > only
> > > > > >     > > >> solution
> > > > > >     > > >> > > that I'm aware of would require
physical machines
> to
> > > be
> > > > > the
> > > > > >     > workers.
> > > > > >     > > >> I
> > > > > >     > > >> > > would be weakly opposed to
having physical servers
> > > > running
> > > > > > on PRs.
> > > > > >     > > >> The
> > > > > >     > > >> > > downsides that I see in order
of importance:
> > > > > >     > > >> > >
> > > > > >     > > >> > > -  We can't autoscale physical
hardware.   If we
> > find
> > > > that
> > > > > > the
> > > > > >     > load
> > > > > >     > > is
> > > > > >     > > >> > too
> > > > > >     > > >> > > high we have to buy more machines.
> > > > > >     > > >> > > -  Security would be tricky,
as they'd have to be
> > > > > connected
> > > > > > to the
> > > > > >     > > >> > internet
> > > > > >     > > >> > > and then to our Jekins master
instance.
> Connecting
> > > via
> > > > a
> > > > > > wired
> > > > > >     > > >> network
> > > > > >     > > >> > > would probably not be possible
on most corporate
> > > > networks
> > > > > > as these
> > > > > >     > > >> > machines
> > > > > >     > > >> > > are by definition running arbitrary
code from the
> > > > > > internet.  Many
> > > > > >     > > >> > corporate
> > > > > >     > > >> > > sites have public wifi that
this machine could
> > > > potentially
> > > > > > connect
> > > > > >     > > to,
> > > > > >     > > >> > but
> > > > > >     > > >> > > then our PRs start failing
if the wifi disconnects
> > > > > > temporarily.
> > > > > >     > To
> > > > > >     > > >> > connect
> > > > > >     > > >> > > to the master we would need
to setup a vpn
> solution
> > > with
> > > > > > endpoints
> > > > > >     > > in
> > > > > >     > > >> our
> > > > > >     > > >> > > vpc on AWS.  This is possible
but would probably
> > > > require a
> > > > > > lot of
> > > > > >     > > >> > security
> > > > > >     > > >> > > work.
> > > > > >     > > >> > > -  We can't just create a simple
startup script or
> > > yaml
> > > > > > file that
> > > > > >     > is
> > > > > >     > > >> > > checked into GitHub to manage
the machine.
> Someone
> > > will
> > > > > > actually
> > > > > >     > > >> have to
> > > > > >     > > >> > > physically administer the machine,
apply updates,
> > etc.
> > > > > > which will
> > > > > >     > > make
> > > > > >     > > >> > > community ownership difficult.
> > > > > >     > > >> > >
> > > > > >     > > >> > > Specific to an OSX build:
> > > > > >     > > >> > > -  We can't virtualize OSX
which means we'd only
> be
> > > able
> > > > > to
> > > > > > cover
> > > > > >     > > one
> > > > > >     > > >> OSX
> > > > > >     > > >> > > build environment per physical
device.  We
> couldn't
> > > > > target a
> > > > > >     > matrix
> > > > > >     > > of
> > > > > >     > > >> > OSX
> > > > > >     > > >> > > and Xcode versions as in Travis.
> > > > > >     > > >> > >
> > > > > >     > > >> > > -Kellen
> > > > > >     > > >> > >
> > > > > >     > > >> > > On Tue, Dec 12, 2017 at 5:46
PM, Chris Olivier <
> > > > > >     > > cjolivier01@gmail.com
> > > > > >     > > >> >
> > > > > >     > > >> > > wrote:
> > > > > >     > > >> > >
> > > > > >     > > >> > > > So why Travis when we
could possibly use
> Jenkins?
> > > > > >     > > >> > > >
> > > > > >     > > >> > > > On Tue, Dec 12, 2017 at
7:59 AM Marco de Abreu <
> > > > > >     > > >> > > > marco.g.abreu@googlemail.com>
> > > > > >     > > >> > > > wrote:
> > > > > >     > > >> > > >
> > > > > >     > > >> > > > > Yes that's correct,
Chris.
> > > > > >     > > >> > > > >
> > > > > >     > > >> > > > > Am 12.12.2017 4:46
nachm. schrieb "Chris
> > Olivier"
> > > <
> > > > > >     > > >> > > cjolivier01@gmail.com
> > > > > >     > > >> > > > >:
> > > > > >     > > >> > > > >
> > > > > >     > > >> > > > > > A quick google
search seems to indicate that
> > Mac
> > > > can
> > > > > > be used
> > > > > >     > > as
> > > > > >     > > >> a
> > > > > >     > > >> > > > Jenkins
> > > > > >     > > >> > > > > > slave. Is this
correct?
> > > > > >     > > >> > > > > >
> > > > > >     > > >> > > > > > On Tue, Dec
12, 2017 at 7:42 AM Steffen
> > Rochel <
> > > > > >     > > >> > > > steffenrochel@gmail.com>
> > > > > >     > > >> > > > > > wrote:
> > > > > >     > > >> > > > > >
> > > > > >     > > >> > > > > > > +1 for
#1 and #2
> > > > > >     > > >> > > > > > >
> > > > > >     > > >> > > > > > > I’m working
on getting a MacPro to add to
> CI
> > > > > system.
> > > > > >     > > >> > > > > > > On Tue,
Dec 12, 2017 at 1:43 AM kellen
> > > > sunderland
> > > > > <
> > > > > >     > > >> > > > > > > kellen.sunderland@gmail.com>
wrote:
> > > > > >     > > >> > > > > > >
> > > > > >     > > >> > > > > > > > Background:
 TravisCI is a startup
> > providing
> > > > > > managed
> > > > > >     > > >> continuous
> > > > > >     > > >> > > > > > > > integration
services with GitHub
> > integration
> > > > and
> > > > > > YAML
> > > > > >     > > based
> > > > > >     > > >> > > > > > > configuration.
> > > > > >     > > >> > > > > > > > TravisCI
is one of the few CI providers
> > that
> > > > > will
> > > > > > build
> > > > > >     > a
> > > > > >     > > >> > variety
> > > > > >     > > >> > > > of
> > > > > >     > > >> > > > > > > > OSX/MacOS
builds for software projects.
> > > Their
> > > > > > pricing
> > > > > >     > > >> ranges
> > > > > >     > > >> > > from
> > > > > >     > > >> > > > > Free
> > > > > >     > > >> > > > > > > > (for
open source, 1 concurrent job, to
> > $489
> > > > > > monthly for
> > > > > >     > 10
> > > > > >     > > >> > > > concurrent
> > > > > >     > > >> > > > > > > jobs).
> > > > > >     > > >> > > > > > > >
> > > > > >     > > >> > > > > > > > Problem:
We’ve had a few OSX build
> issues
> > > slip
> > > > > > into
> > > > > >     > MXNet
> > > > > >     > > >> > master
> > > > > >     > > >> > > in
> > > > > >     > > >> > > > > the
> > > > > >     > > >> > > > > > > > past
few weeks.  We’ve previously had a
> > > Travis
> > > > > CI
> > > > > > based
> > > > > >     > > >> testing
> > > > > >     > > >> > > > > system
> > > > > >     > > >> > > > > > > that
> > > > > >     > > >> > > > > > > > would
have caught these issues.
> > > > > >     > > >> > > > > > > >
> > > > > >     > > >> > > > > > > > Proposals
so far:
> > > > > >     > > >> > > > > > > >
> > > > > >     > > >> > > > > > > > 1)
Use TravisCI in it’s free mode for a
> > very
> > > > > > minimal
> > > > > >     > > sanity
> > > > > >     > > >> > check
> > > > > >     > > >> > > > on
> > > > > >     > > >> > > > > > OSX.
> > > > > >     > > >> > > > > > > > If
we compile the program, and for
> example
> > > run
> > > > > > C++ unit
> > > > > >     > > >> tests
> > > > > >     > > >> > > we’re
> > > > > >     > > >> > > > > > > > unlikely
to run into problems with
> queued
> > > > > > builds.  The
> > > > > >     > > total
> > > > > >     > > >> > > build
> > > > > >     > > >> > > > > time
> > > > > >     > > >> > > > > > > > here
should be less than 15 minutes.
> > > > > > Configuration
> > > > > >     > should
> > > > > >     > > >> be
> > > > > >     > > >> > > quite
> > > > > >     > > >> > > > > > > simple
> > > > > >     > > >> > > > > > > > and
easy to maintain.  Error messages
> > should
> > > > > also
> > > > > > be
> > > > > >     > > >> obvious to
> > > > > >     > > >> > > > > > > > contributors.
> > > > > >     > > >> > > > > > > > 2)
Run clang in Linux with our current
> CI.
> > > > > > Building
> > > > > >     > with
> > > > > >     > > >> clang
> > > > > >     > > >> > > > > should
> > > > > >     > > >> > > > > > > > take
less than 10 minutes, should flush
> > out
> > > a
> > > > > > large
> > > > > >     > subset
> > > > > >     > > >> of
> > > > > >     > > >> > the
> > > > > >     > > >> > > > > > issues
> > > > > >     > > >> > > > > > > > we’ve
seen with OSX, and be quite easy
> to
> > > > > > maintain.
> > > > > >     > > >> > > > > > > > 3)
Run full test-suites in TravisCI,
> > > equaling
> > > > > the
> > > > > > level
> > > > > >     > of
> > > > > >     > > >> > > coverage
> > > > > >     > > >> > > > > we
> > > > > >     > > >> > > > > > > > provide
to Linux in Jenkins.  This could
> > > > require
> > > > > > us to
> > > > > >     > > >> > subscribe
> > > > > >     > > >> > > > to a
> > > > > >     > > >> > > > > > > > monthly
package with Travis to ensure
> our
> > > > build
> > > > > > queue
> > > > > >     > > >> doesn’t
> > > > > >     > > >> > > grow
> > > > > >     > > >> > > > to
> > > > > >     > > >> > > > > > an
> > > > > >     > > >> > > > > > > > unacceptable
length.  It may also
> require
> > a
> > > > > > volunteer to
> > > > > >     > > >> setup
> > > > > >     > > >> > > and
> > > > > >     > > >> > > > > > > maintain
> > > > > >     > > >> > > > > > > > long-term.
> > > > > >     > > >> > > > > > > >
> > > > > >     > > >> > > > > > > > I’d
+1 #1 and #2 as I think those should
> > be
> > > > > > low-cost,
> > > > > >     > > >> > > low-maintence
> > > > > >     > > >> > > > > > > > solutions
that should catch the majority
> > of
> > > > the
> > > > > > problems
> > > > > >     > > >> we’ve
> > > > > >     > > >> > > seen
> > > > > >     > > >> > > > > > thus
> > > > > >     > > >> > > > > > > > far.
> > > > > >     > > >> > > > > > > >
> > > > > >     > > >> > > > > > > > -Kellen
> > > > > >     > > >> > > > > > > >
> > > > > >     > > >> > > > > > >
> > > > > >     > > >> > > > > >
> > > > > >     > > >> > > > >
> > > > > >     > > >> > > >
> > > > > >     > > >> > >
> > > > > >     > > >> >
> > > > > >     > > >>
> > > > > >     > > >
> > > > > >     > >
> > > > > >     >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message