mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marco de Abreu <marco.g.ab...@googlemail.com>
Subject Re: CI failure due to offline llvm.org
Date Fri, 12 Jan 2018 15:27:05 GMT
Okay, so the only disadvantage of deleting the entire workspace is that
we’ll have to pull the subrepos every single time? If it’s not working
100%, shouldn’t we just rather stick to the hard approach using a hard
delete of the workspace rather than risking a broken slave?

If I understand it right, the only advantage of using your proposed
approach is the fact that we won’t have to pull sub-repos, right?

-Marco

On Fri, Jan 12, 2018 at 3:19 PM, Chris Olivier <cjolivier01@gmail.com>
wrote:

> btw i think a manual delete of some sort is still necessary as we found
> that git clean (with the proper options) does not work 100% if the time. we
> found at the time reproducible situation in which it does not and it was
> breaking every build on that machine.
>
>
> On Fri, Jan 12, 2018 at 5:30 AM Marco de Abreu <
> marco.g.abreu@googlemail.com>
> wrote:
>
> > Seems right to me, but I will have to investigate. I noted it down.
> >
> > -Marco
> >
> > Am 12.01.2018 1:21 nachm. schrieb "Pedro Larroy" <
> > pedro.larroy.lists@gmail.com>:
> >
> > > I think Chris is right, git clean with the right options plus proper
> > > initialization of the submodules should not make any difference versus
> > > deleting the entire workspace. Right?
> > >
> > > On Fri, Jan 12, 2018 at 8:56 AM, kellen sunderland
> > > <kellen.sunderland@gmail.com> wrote:
> > > > Doing a few searches I see that llvm.org <http://apt.llvm.org>
> doesn't
> > > > appear to be stable enough for CI.  I'm going to write something to
> > > > hopefully make it a little more stable today, while still allowing
> > those
> > > at
> > > > home to have easily reproducible build steps through docker.  What
> I'd
> > > > propose is we cache the 15 or so deb packages that get installed with
> > > clang
> > > > in s3 in the CI env.  For home users who can't reach the cached s3
> > bucket
> > > > we fall back to apt.llvm.org installation.  Sound like a reasonable
> > plan
> > > > Marco?
> > > >
> > > > On Fri, Jan 12, 2018 at 8:21 AM, Marco de Abreu <
> > > > marco.g.abreu@googlemail.com> wrote:
> > > >
> > > >> Aah I understand, you're right, we should revisit our decisions.
> I'll
> > > put
> > > >> it into the backlog so I don't forget it.
> > > >>
> > > >> -Marco
> > > >>
> > > >> Am 12.01.2018 2:48 vorm. schrieb "Chris Olivier" <
> > cjolivier01@gmail.com
> > > >:
> > > >>
> > > >> Yeah, I'm just saying the whole delete was done as a drastic measure
> > at
> > > the
> > > >> time. It may not be necessary do re-pull everything. Instead of
> > deleting
> > > >> everything, you could delete everything *except* the .git dir. and
> > then
> > > >> checkout the commit you want and it'll regenerate the sources from
> the
> > > .git
> > > >> database.
> > > >>
> > > >> This, of course, assuming the .git database is never wrong...  If
> > > something
> > > >> goes wrong, you can nuke the whole dir.
> > > >>
> > > >>
> > > >> On Thu, Jan 11, 2018 at 5:42 PM, Marco de Abreu <
> > > >> marco.g.abreu@googlemail.com> wrote:
> > > >>
> > > >> > Exactly
> > > >> >
> > > >> > -Marco
> > > >> >
> > > >> > On Fri, Jan 12, 2018 at 2:40 AM, Chris Olivier <
> > cjolivier01@gmail.com
> > > >
> > > >> > wrote:
> > > >> >
> > > >> > > Actrually, this is the commit related to it.
> > > >> > > https://github.com/cjolivier01/mxnet/commit/
> > > >> > 573a010879583885a0193e30dc0b8c
> > > >> > > 848d80869b
> > > >> > >
> > > >> > > Before, the workspace directory wasn't being deleted.  Now
it
> is,
> > > >> > correct?
> > > >> > > Everything under the top directory, right?
> > > >> > >
> > > >> > > So a git clone re-pulls everything?
> > > >> > >
> > > >> > > On Thu, Jan 11, 2018 at 4:51 PM, Marco de Abreu <
> > > >> > > marco.g.abreu@googlemail.com> wrote:
> > > >> > >
> > > >> > > > deleteDir() deletes the content of the current workspace
> > > >> > > >
> > > >> > > > Okay, I haven't seen any errors related to lua-package
not
> being
> > > >> > deleted.
> > > >> > > > Do you have a CI-link by any chance?
> > > >> > > >
> > > >> > > > -Marco
> > > >> > > >
> > > >> > > > On Fri, Jan 12, 2018 at 1:49 AM, Chris Olivier <
> > > >> cjolivier01@gmail.com>
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > what is deleteDir() call doing in Jenkinsfile?
> > > >> > > > > Yes, I mentioned the case where it wasn't getting
cleaned.
> > > >> > > > >
> > > >> > > > > On Thu, Jan 11, 2018 at 4:41 PM, Marco de Abreu
<
> > > >> > > > > marco.g.abreu@googlemail.com> wrote:
> > > >> > > > >
> > > >> > > > > > During git_init: First we're just using git
clean, if
> > checkout
> > > >> > fails,
> > > >> > > > > we're
> > > >> > > > > > deleting the entire workspace and retrying.
> > > >> > > > > >
> > > >> > > > > > During build: First we're using regular make.
If build
> > fails,
> > > >> we're
> > > >> > > > using
> > > >> > > > > > make clean before executing make again.
> > > >> > > > > >
> > > >> > > > > > During test: No cleanup happening in case
of failure.
> > > >> > > > > >
> > > >> > > > > > So far, I haven't noticed any files not being
deleted in
> the
> > > >> > > workspace.
> > > >> > > > > Do
> > > >> > > > > > you know an example?
> > > >> > > > > >
> > > >> > > > > > -Marco
> > > >> > > > > >
> > > >> > > > > > On Fri, Jan 12, 2018 at 1:34 AM, Chris Olivier
<
> > > >> > > cjolivier01@gmail.com>
> > > >> > > > > > wrote:
> > > >> > > > > >
> > > >> > > > > > > What approach is used now?  I see in
Jenkinsfile() that
> > > >> > deleteDir()
> > > >> > > > is
> > > >> > > > > > > called at the top of init_git() and
init_git_win().
> That
> > > >> > dele5tes
> > > >> > > > the
> > > >> > > > > > > whole directory, correct?
> > > >> > > > > > >
> > > >> > > > > > > Before there were problems with 'git
clean -d -f' *not*
> > > >> deleting
> > > >> > > some
> > > >> > > > > > > directories which were tracked on one
branch and not on
> > > >> another,
> > > >> > > > which
> > > >> > > > > I
> > > >> > > > > > > believe is why deletDir() was put there.
The directory I
> > > recall
> > > >> > was
> > > >> > > > > > > something like lua-package or something
that was in
> > > someone's
> > > >> > > private
> > > >> > > > > > repo
> > > >> > > > > > > or something like that...
> > > >> > > > > > >
> > > >> > > > > > > On Thu, Jan 11, 2018 at 4:02 PM, Marco
de Abreu <
> > > >> > > > > > > marco.g.abreu@googlemail.com> wrote:
> > > >> > > > > > >
> > > >> > > > > > > > While it's a quite harsh solution
to delete the entire
> > > >> > > workspace, I
> > > >> > > > > > think
> > > >> > > > > > > > that it's a good way. Git checkout
takes between 2 and
> > 10
> > > >> > > seconds,
> > > >> > > > > so I
> > > >> > > > > > > > don't think we need to optimize
in that regard.
> > > >> > > > > > > >
> > > >> > > > > > > > git clean is our 'soft' approach
to clean up. Deleting
> > the
> > > >> > > > workspace
> > > >> > > > > is
> > > >> > > > > > > the
> > > >> > > > > > > > 'hard' approach, so this shouldn't
be an issue.
> > > >> > > > > > > >
> > > >> > > > > > > > But there is one catch: Windows
builds are not
> > > containerized
> > > >> > and
> > > >> > > > > while
> > > >> > > > > > we
> > > >> > > > > > > > delete the workspace, there could
still be a lot of
> > files
> > > >> which
> > > >> > > are
> > > >> > > > > not
> > > >> > > > > > > > being tracked. In future I'd like
to have at least a
> > > >> > > > > file-system-layer
> > > >> > > > > > in
> > > >> > > > > > > > between our tests and the host,
but we will have to
> > > analyze
> > > >> if
> > > >> > > > > > something
> > > >> > > > > > > > like this exists. At the moment,
we even got tests
> > > writing to
> > > >> > > > > system32.
> > > >> > > > > > > >
> > > >> > > > > > > > -Marco
> > > >> > > > > > > >
> > > >> > > > > > > > On Fri, Jan 12, 2018 at 12:44 AM,
Chris Olivier <
> > > >> > > > > cjolivier01@gmail.com
> > > >> > > > > > >
> > > >> > > > > > > > wrote:
> > > >> > > > > > > >
> > > >> > > > > > > > > Ok, but still on that note.
I remember before that
> > when
> > > >> some
> > > >> > > > > problems
> > > >> > > > > > > > were
> > > >> > > > > > > > > being fixed in CI (before
your time), they switched
> to
> > > >> > deleting
> > > >> > > > the
> > > >> > > > > > > > entire
> > > >> > > > > > > > > source directory, ".git" subdirectory
and all.  At
> the
> > > >> time,
> > > >> > > the
> > > >> > > > CI
> > > >> > > > > > was
> > > >> > > > > > > > in
> > > >> > > > > > > > > such an chaotic state that
I didn't make an issue of
> > it,
> > > >> but
> > > >> > > now
> > > >> > > > > that
> > > >> > > > > > > it
> > > >> > > > > > > > > has stabilized (for the most
part, today's incident
> > > >> > > > > > notwithstanding), I
> > > >> > > > > > > > > think that we may want to
revisit it if it is still
> > > doing
> > > >> > that.
> > > >> > > > > you
> > > >> > > > > > > > could,
> > > >> > > > > > > > > for example, just delete everything
except the .git
> > > >> directory
> > > >> > > and
> > > >> > > > > > then
> > > >> > > > > > > > do a
> > > >> > > > > > > > > 'git reset --hard' to get
back a baseline before
> > having
> > > to
> > > >> > > > > > re-download
> > > >> > > > > > > > > everything every tim e(also
should speed up the
> > builds).
> > > >> > > > > > > > >
> > > >> > > > > > > > > Note that 'git clean' was
not working as it doesn't
> > > delete
> > > >> > > > > 'unknown'
> > > >> > > > > > > > > directories, which was the
problem.
> > > >> > > > > > > > >
> > > >> > > > > > > > > WDYT?
> > > >> > > > > > > > >
> > > >> > > > > > > > > On Thu, Jan 11, 2018 at 3:26
PM, Marco de Abreu <
> > > >> > > > > > > > > marco.g.abreu@googlemail.com>
wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > > > This happens because
we just merged the clang
> > > compilation
> > > >> > > > > > > > > > https://github.com/apache/incubator-mxnet/commit/
> > > >> > > > > > > > > > 2b73aac527a3439ec0dc9b1e76c6df09ea347eb1.
> > > >> > > > > > > > > > This means that clang
has to get installed on all
> > > slaves
> > > >> > and
> > > >> > > > > after
> > > >> > > > > > > some
> > > >> > > > > > > > > > time, the docker images
will be cached. The
> problem
> > > right
> > > >> > now
> > > >> > > > is
> > > >> > > > > > that
> > > >> > > > > > > > > their
> > > >> > > > > > > > > > apt-server is unavailable,
means the initial
> > > installation
> > > >> > to
> > > >> > > > > create
> > > >> > > > > > > the
> > > >> > > > > > > > > > docker cache doesn't
succeed. In future, this will
> > be
> > > >> > cached.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > -Marco
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > On Thu, Jan 11, 2018
at 11:45 PM, Chris Olivier <
> > > >> > > > > > > cjolivier01@gmail.com
> > > >> > > > > > > > >
> > > >> > > > > > > > > > wrote:
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > >  do we download
all submodules from scratch
> every
> > > >> build?
> > > >> > > if
> > > >> > > > we
> > > >> > > > > > do
> > > >> > > > > > > > then
> > > >> > > > > > > > > > we
> > > >> > > > > > > > > > > should probably
find a way not to suggest just
> > doing
> > > >> git
> > > >> > > > reset
> > > >> > > > > or
> > > >> > > > > > > > > > something
> > > >> > > > > > > > > > > like that
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > On Thu, Jan 11,
2018 at 1:47 PM Marco de Abreu <
> > > >> > > > > > > > > > > marco.g.abreu@googlemail.com>
> > > >> > > > > > > > > > > wrote:
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > Hello,
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > we're currently
experiencing a CI outage
> caused
> > by
> > > >> > > > > > > > > http://apt.llvm.org
> > > >> > > > > > > > > > > not
> > > >> > > > > > > > > > > > being reachable.
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > Best regards,
> > > >> > > > > > > > > > > > Marco
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message