mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kellen sunderland <kellen.sunderl...@gmail.com>
Subject Re: CI failure due to offline llvm.org
Date Fri, 12 Jan 2018 07:56:54 GMT
Doing a few searches I see that llvm.org <http://apt.llvm.org> doesn't
appear to be stable enough for CI.  I'm going to write something to
hopefully make it a little more stable today, while still allowing those at
home to have easily reproducible build steps through docker.  What I'd
propose is we cache the 15 or so deb packages that get installed with clang
in s3 in the CI env.  For home users who can't reach the cached s3 bucket
we fall back to apt.llvm.org installation.  Sound like a reasonable plan
Marco?

On Fri, Jan 12, 2018 at 8:21 AM, Marco de Abreu <
marco.g.abreu@googlemail.com> wrote:

> Aah I understand, you're right, we should revisit our decisions. I'll put
> it into the backlog so I don't forget it.
>
> -Marco
>
> Am 12.01.2018 2:48 vorm. schrieb "Chris Olivier" <cjolivier01@gmail.com>:
>
> Yeah, I'm just saying the whole delete was done as a drastic measure at the
> time. It may not be necessary do re-pull everything. Instead of deleting
> everything, you could delete everything *except* the .git dir. and then
> checkout the commit you want and it'll regenerate the sources from the .git
> database.
>
> This, of course, assuming the .git database is never wrong...  If something
> goes wrong, you can nuke the whole dir.
>
>
> On Thu, Jan 11, 2018 at 5:42 PM, Marco de Abreu <
> marco.g.abreu@googlemail.com> wrote:
>
> > Exactly
> >
> > -Marco
> >
> > On Fri, Jan 12, 2018 at 2:40 AM, Chris Olivier <cjolivier01@gmail.com>
> > wrote:
> >
> > > Actrually, this is the commit related to it.
> > > https://github.com/cjolivier01/mxnet/commit/
> > 573a010879583885a0193e30dc0b8c
> > > 848d80869b
> > >
> > > Before, the workspace directory wasn't being deleted.  Now it is,
> > correct?
> > > Everything under the top directory, right?
> > >
> > > So a git clone re-pulls everything?
> > >
> > > On Thu, Jan 11, 2018 at 4:51 PM, Marco de Abreu <
> > > marco.g.abreu@googlemail.com> wrote:
> > >
> > > > deleteDir() deletes the content of the current workspace
> > > >
> > > > Okay, I haven't seen any errors related to lua-package not being
> > deleted.
> > > > Do you have a CI-link by any chance?
> > > >
> > > > -Marco
> > > >
> > > > On Fri, Jan 12, 2018 at 1:49 AM, Chris Olivier <
> cjolivier01@gmail.com>
> > > > wrote:
> > > >
> > > > > what is deleteDir() call doing in Jenkinsfile?
> > > > > Yes, I mentioned the case where it wasn't getting cleaned.
> > > > >
> > > > > On Thu, Jan 11, 2018 at 4:41 PM, Marco de Abreu <
> > > > > marco.g.abreu@googlemail.com> wrote:
> > > > >
> > > > > > During git_init: First we're just using git clean, if checkout
> > fails,
> > > > > we're
> > > > > > deleting the entire workspace and retrying.
> > > > > >
> > > > > > During build: First we're using regular make. If build fails,
> we're
> > > > using
> > > > > > make clean before executing make again.
> > > > > >
> > > > > > During test: No cleanup happening in case of failure.
> > > > > >
> > > > > > So far, I haven't noticed any files not being deleted in the
> > > workspace.
> > > > > Do
> > > > > > you know an example?
> > > > > >
> > > > > > -Marco
> > > > > >
> > > > > > On Fri, Jan 12, 2018 at 1:34 AM, Chris Olivier <
> > > cjolivier01@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > What approach is used now?  I see in Jenkinsfile() that
> > deleteDir()
> > > > is
> > > > > > > called at the top of init_git() and init_git_win().  That
> > dele5tes
> > > > the
> > > > > > > whole directory, correct?
> > > > > > >
> > > > > > > Before there were problems with 'git clean -d -f' *not*
> deleting
> > > some
> > > > > > > directories which were tracked on one branch and not on
> another,
> > > > which
> > > > > I
> > > > > > > believe is why deletDir() was put there. The directory
I recall
> > was
> > > > > > > something like lua-package or something that was in someone's
> > > private
> > > > > > repo
> > > > > > > or something like that...
> > > > > > >
> > > > > > > On Thu, Jan 11, 2018 at 4:02 PM, Marco de Abreu <
> > > > > > > marco.g.abreu@googlemail.com> wrote:
> > > > > > >
> > > > > > > > While it's a quite harsh solution to delete the entire
> > > workspace, I
> > > > > > think
> > > > > > > > that it's a good way. Git checkout takes between 2
and 10
> > > seconds,
> > > > > so I
> > > > > > > > don't think we need to optimize in that regard.
> > > > > > > >
> > > > > > > > git clean is our 'soft' approach to clean up. Deleting
the
> > > > workspace
> > > > > is
> > > > > > > the
> > > > > > > > 'hard' approach, so this shouldn't be an issue.
> > > > > > > >
> > > > > > > > But there is one catch: Windows builds are not containerized
> > and
> > > > > while
> > > > > > we
> > > > > > > > delete the workspace, there could still be a lot of
files
> which
> > > are
> > > > > not
> > > > > > > > being tracked. In future I'd like to have at least
a
> > > > > file-system-layer
> > > > > > in
> > > > > > > > between our tests and the host, but we will have to
analyze
> if
> > > > > > something
> > > > > > > > like this exists. At the moment, we even got tests
writing to
> > > > > system32.
> > > > > > > >
> > > > > > > > -Marco
> > > > > > > >
> > > > > > > > On Fri, Jan 12, 2018 at 12:44 AM, Chris Olivier <
> > > > > cjolivier01@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Ok, but still on that note. I remember before
that when
> some
> > > > > problems
> > > > > > > > were
> > > > > > > > > being fixed in CI (before your time), they switched
to
> > deleting
> > > > the
> > > > > > > > entire
> > > > > > > > > source directory, ".git" subdirectory and all.
 At the
> time,
> > > the
> > > > CI
> > > > > > was
> > > > > > > > in
> > > > > > > > > such an chaotic state that I didn't make an issue
of it,
> but
> > > now
> > > > > that
> > > > > > > it
> > > > > > > > > has stabilized (for the most part, today's incident
> > > > > > notwithstanding), I
> > > > > > > > > think that we may want to revisit it if it is
still doing
> > that.
> > > > > you
> > > > > > > > could,
> > > > > > > > > for example, just delete everything except the
.git
> directory
> > > and
> > > > > > then
> > > > > > > > do a
> > > > > > > > > 'git reset --hard' to get back a baseline before
having to
> > > > > > re-download
> > > > > > > > > everything every tim e(also should speed up the
builds).
> > > > > > > > >
> > > > > > > > > Note that 'git clean' was not working as it doesn't
delete
> > > > > 'unknown'
> > > > > > > > > directories, which was the problem.
> > > > > > > > >
> > > > > > > > > WDYT?
> > > > > > > > >
> > > > > > > > > On Thu, Jan 11, 2018 at 3:26 PM, Marco de Abreu
<
> > > > > > > > > marco.g.abreu@googlemail.com> wrote:
> > > > > > > > >
> > > > > > > > > > This happens because we just merged the
clang compilation
> > > > > > > > > > https://github.com/apache/incubator-mxnet/commit/
> > > > > > > > > > 2b73aac527a3439ec0dc9b1e76c6df09ea347eb1.
> > > > > > > > > > This means that clang has to get installed
on all slaves
> > and
> > > > > after
> > > > > > > some
> > > > > > > > > > time, the docker images will be cached.
The problem right
> > now
> > > > is
> > > > > > that
> > > > > > > > > their
> > > > > > > > > > apt-server is unavailable, means the initial
installation
> > to
> > > > > create
> > > > > > > the
> > > > > > > > > > docker cache doesn't succeed. In future,
this will be
> > cached.
> > > > > > > > > >
> > > > > > > > > > -Marco
> > > > > > > > > >
> > > > > > > > > > On Thu, Jan 11, 2018 at 11:45 PM, Chris
Olivier <
> > > > > > > cjolivier01@gmail.com
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > >  do we download all submodules from
scratch every
> build?
> > > if
> > > > we
> > > > > > do
> > > > > > > > then
> > > > > > > > > > we
> > > > > > > > > > > should probably find a way not to suggest
just doing
> git
> > > > reset
> > > > > or
> > > > > > > > > > something
> > > > > > > > > > > like that
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jan 11, 2018 at 1:47 PM Marco
de Abreu <
> > > > > > > > > > > marco.g.abreu@googlemail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hello,
> > > > > > > > > > > >
> > > > > > > > > > > > we're currently experiencing a
CI outage caused by
> > > > > > > > > http://apt.llvm.org
> > > > > > > > > > > not
> > > > > > > > > > > > being reachable.
> > > > > > > > > > > >
> > > > > > > > > > > > Best regards,
> > > > > > > > > > > > Marco
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message