mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Olivier <cjolivie...@gmail.com>
Subject Re: CI failure due to offline llvm.org
Date Fri, 12 Jan 2018 14:19:46 GMT
btw i think a manual delete of some sort is still necessary as we found
that git clean (with the proper options) does not work 100% if the time. we
found at the time reproducible situation in which it does not and it was
breaking every build on that machine.


On Fri, Jan 12, 2018 at 5:30 AM Marco de Abreu <marco.g.abreu@googlemail.com>
wrote:

> Seems right to me, but I will have to investigate. I noted it down.
>
> -Marco
>
> Am 12.01.2018 1:21 nachm. schrieb "Pedro Larroy" <
> pedro.larroy.lists@gmail.com>:
>
> > I think Chris is right, git clean with the right options plus proper
> > initialization of the submodules should not make any difference versus
> > deleting the entire workspace. Right?
> >
> > On Fri, Jan 12, 2018 at 8:56 AM, kellen sunderland
> > <kellen.sunderland@gmail.com> wrote:
> > > Doing a few searches I see that llvm.org <http://apt.llvm.org> doesn't
> > > appear to be stable enough for CI.  I'm going to write something to
> > > hopefully make it a little more stable today, while still allowing
> those
> > at
> > > home to have easily reproducible build steps through docker.  What I'd
> > > propose is we cache the 15 or so deb packages that get installed with
> > clang
> > > in s3 in the CI env.  For home users who can't reach the cached s3
> bucket
> > > we fall back to apt.llvm.org installation.  Sound like a reasonable
> plan
> > > Marco?
> > >
> > > On Fri, Jan 12, 2018 at 8:21 AM, Marco de Abreu <
> > > marco.g.abreu@googlemail.com> wrote:
> > >
> > >> Aah I understand, you're right, we should revisit our decisions. I'll
> > put
> > >> it into the backlog so I don't forget it.
> > >>
> > >> -Marco
> > >>
> > >> Am 12.01.2018 2:48 vorm. schrieb "Chris Olivier" <
> cjolivier01@gmail.com
> > >:
> > >>
> > >> Yeah, I'm just saying the whole delete was done as a drastic measure
> at
> > the
> > >> time. It may not be necessary do re-pull everything. Instead of
> deleting
> > >> everything, you could delete everything *except* the .git dir. and
> then
> > >> checkout the commit you want and it'll regenerate the sources from the
> > .git
> > >> database.
> > >>
> > >> This, of course, assuming the .git database is never wrong...  If
> > something
> > >> goes wrong, you can nuke the whole dir.
> > >>
> > >>
> > >> On Thu, Jan 11, 2018 at 5:42 PM, Marco de Abreu <
> > >> marco.g.abreu@googlemail.com> wrote:
> > >>
> > >> > Exactly
> > >> >
> > >> > -Marco
> > >> >
> > >> > On Fri, Jan 12, 2018 at 2:40 AM, Chris Olivier <
> cjolivier01@gmail.com
> > >
> > >> > wrote:
> > >> >
> > >> > > Actrually, this is the commit related to it.
> > >> > > https://github.com/cjolivier01/mxnet/commit/
> > >> > 573a010879583885a0193e30dc0b8c
> > >> > > 848d80869b
> > >> > >
> > >> > > Before, the workspace directory wasn't being deleted.  Now it
is,
> > >> > correct?
> > >> > > Everything under the top directory, right?
> > >> > >
> > >> > > So a git clone re-pulls everything?
> > >> > >
> > >> > > On Thu, Jan 11, 2018 at 4:51 PM, Marco de Abreu <
> > >> > > marco.g.abreu@googlemail.com> wrote:
> > >> > >
> > >> > > > deleteDir() deletes the content of the current workspace
> > >> > > >
> > >> > > > Okay, I haven't seen any errors related to lua-package not
being
> > >> > deleted.
> > >> > > > Do you have a CI-link by any chance?
> > >> > > >
> > >> > > > -Marco
> > >> > > >
> > >> > > > On Fri, Jan 12, 2018 at 1:49 AM, Chris Olivier <
> > >> cjolivier01@gmail.com>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > what is deleteDir() call doing in Jenkinsfile?
> > >> > > > > Yes, I mentioned the case where it wasn't getting cleaned.
> > >> > > > >
> > >> > > > > On Thu, Jan 11, 2018 at 4:41 PM, Marco de Abreu <
> > >> > > > > marco.g.abreu@googlemail.com> wrote:
> > >> > > > >
> > >> > > > > > During git_init: First we're just using git clean,
if
> checkout
> > >> > fails,
> > >> > > > > we're
> > >> > > > > > deleting the entire workspace and retrying.
> > >> > > > > >
> > >> > > > > > During build: First we're using regular make.
If build
> fails,
> > >> we're
> > >> > > > using
> > >> > > > > > make clean before executing make again.
> > >> > > > > >
> > >> > > > > > During test: No cleanup happening in case of failure.
> > >> > > > > >
> > >> > > > > > So far, I haven't noticed any files not being
deleted in the
> > >> > > workspace.
> > >> > > > > Do
> > >> > > > > > you know an example?
> > >> > > > > >
> > >> > > > > > -Marco
> > >> > > > > >
> > >> > > > > > On Fri, Jan 12, 2018 at 1:34 AM, Chris Olivier
<
> > >> > > cjolivier01@gmail.com>
> > >> > > > > > wrote:
> > >> > > > > >
> > >> > > > > > > What approach is used now?  I see in Jenkinsfile()
that
> > >> > deleteDir()
> > >> > > > is
> > >> > > > > > > called at the top of init_git() and init_git_win().
 That
> > >> > dele5tes
> > >> > > > the
> > >> > > > > > > whole directory, correct?
> > >> > > > > > >
> > >> > > > > > > Before there were problems with 'git clean
-d -f' *not*
> > >> deleting
> > >> > > some
> > >> > > > > > > directories which were tracked on one branch
and not on
> > >> another,
> > >> > > > which
> > >> > > > > I
> > >> > > > > > > believe is why deletDir() was put there.
The directory I
> > recall
> > >> > was
> > >> > > > > > > something like lua-package or something that
was in
> > someone's
> > >> > > private
> > >> > > > > > repo
> > >> > > > > > > or something like that...
> > >> > > > > > >
> > >> > > > > > > On Thu, Jan 11, 2018 at 4:02 PM, Marco de
Abreu <
> > >> > > > > > > marco.g.abreu@googlemail.com> wrote:
> > >> > > > > > >
> > >> > > > > > > > While it's a quite harsh solution to
delete the entire
> > >> > > workspace, I
> > >> > > > > > think
> > >> > > > > > > > that it's a good way. Git checkout takes
between 2 and
> 10
> > >> > > seconds,
> > >> > > > > so I
> > >> > > > > > > > don't think we need to optimize in that
regard.
> > >> > > > > > > >
> > >> > > > > > > > git clean is our 'soft' approach to
clean up. Deleting
> the
> > >> > > > workspace
> > >> > > > > is
> > >> > > > > > > the
> > >> > > > > > > > 'hard' approach, so this shouldn't be
an issue.
> > >> > > > > > > >
> > >> > > > > > > > But there is one catch: Windows builds
are not
> > containerized
> > >> > and
> > >> > > > > while
> > >> > > > > > we
> > >> > > > > > > > delete the workspace, there could still
be a lot of
> files
> > >> which
> > >> > > are
> > >> > > > > not
> > >> > > > > > > > being tracked. In future I'd like to
have at least a
> > >> > > > > file-system-layer
> > >> > > > > > in
> > >> > > > > > > > between our tests and the host, but
we will have to
> > analyze
> > >> if
> > >> > > > > > something
> > >> > > > > > > > like this exists. At the moment, we
even got tests
> > writing to
> > >> > > > > system32.
> > >> > > > > > > >
> > >> > > > > > > > -Marco
> > >> > > > > > > >
> > >> > > > > > > > On Fri, Jan 12, 2018 at 12:44 AM, Chris
Olivier <
> > >> > > > > cjolivier01@gmail.com
> > >> > > > > > >
> > >> > > > > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > Ok, but still on that note. I remember
before that
> when
> > >> some
> > >> > > > > problems
> > >> > > > > > > > were
> > >> > > > > > > > > being fixed in CI (before your
time), they switched to
> > >> > deleting
> > >> > > > the
> > >> > > > > > > > entire
> > >> > > > > > > > > source directory, ".git" subdirectory
and all.  At the
> > >> time,
> > >> > > the
> > >> > > > CI
> > >> > > > > > was
> > >> > > > > > > > in
> > >> > > > > > > > > such an chaotic state that I didn't
make an issue of
> it,
> > >> but
> > >> > > now
> > >> > > > > that
> > >> > > > > > > it
> > >> > > > > > > > > has stabilized (for the most part,
today's incident
> > >> > > > > > notwithstanding), I
> > >> > > > > > > > > think that we may want to revisit
it if it is still
> > doing
> > >> > that.
> > >> > > > > you
> > >> > > > > > > > could,
> > >> > > > > > > > > for example, just delete everything
except the .git
> > >> directory
> > >> > > and
> > >> > > > > > then
> > >> > > > > > > > do a
> > >> > > > > > > > > 'git reset --hard' to get back
a baseline before
> having
> > to
> > >> > > > > > re-download
> > >> > > > > > > > > everything every tim e(also should
speed up the
> builds).
> > >> > > > > > > > >
> > >> > > > > > > > > Note that 'git clean' was not working
as it doesn't
> > delete
> > >> > > > > 'unknown'
> > >> > > > > > > > > directories, which was the problem.
> > >> > > > > > > > >
> > >> > > > > > > > > WDYT?
> > >> > > > > > > > >
> > >> > > > > > > > > On Thu, Jan 11, 2018 at 3:26 PM,
Marco de Abreu <
> > >> > > > > > > > > marco.g.abreu@googlemail.com>
wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > > This happens because we just
merged the clang
> > compilation
> > >> > > > > > > > > > https://github.com/apache/incubator-mxnet/commit/
> > >> > > > > > > > > > 2b73aac527a3439ec0dc9b1e76c6df09ea347eb1.
> > >> > > > > > > > > > This means that clang has
to get installed on all
> > slaves
> > >> > and
> > >> > > > > after
> > >> > > > > > > some
> > >> > > > > > > > > > time, the docker images will
be cached. The problem
> > right
> > >> > now
> > >> > > > is
> > >> > > > > > that
> > >> > > > > > > > > their
> > >> > > > > > > > > > apt-server is unavailable,
means the initial
> > installation
> > >> > to
> > >> > > > > create
> > >> > > > > > > the
> > >> > > > > > > > > > docker cache doesn't succeed.
In future, this will
> be
> > >> > cached.
> > >> > > > > > > > > >
> > >> > > > > > > > > > -Marco
> > >> > > > > > > > > >
> > >> > > > > > > > > > On Thu, Jan 11, 2018 at 11:45
PM, Chris Olivier <
> > >> > > > > > > cjolivier01@gmail.com
> > >> > > > > > > > >
> > >> > > > > > > > > > wrote:
> > >> > > > > > > > > >
> > >> > > > > > > > > > >  do we download all submodules
from scratch every
> > >> build?
> > >> > > if
> > >> > > > we
> > >> > > > > > do
> > >> > > > > > > > then
> > >> > > > > > > > > > we
> > >> > > > > > > > > > > should probably find
a way not to suggest just
> doing
> > >> git
> > >> > > > reset
> > >> > > > > or
> > >> > > > > > > > > > something
> > >> > > > > > > > > > > like that
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > On Thu, Jan 11, 2018
at 1:47 PM Marco de Abreu <
> > >> > > > > > > > > > > marco.g.abreu@googlemail.com>
> > >> > > > > > > > > > > wrote:
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > > Hello,
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > we're currently
experiencing a CI outage caused
> by
> > >> > > > > > > > > http://apt.llvm.org
> > >> > > > > > > > > > > not
> > >> > > > > > > > > > > > being reachable.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > Best regards,
> > >> > > > > > > > > > > > Marco
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message