mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Olivier <cjolivie...@gmail.com>
Subject Re: [Proposal] Stabilizing Apache MXNet CI build system
Date Wed, 08 Nov 2017 22:43:29 GMT
+1

On Wed, Nov 8, 2017 at 2:40 PM Meghna Baijal <meghnabaijal2017@gmail.com>
wrote:

> Thanks for the active discussion on the document for the new CI for MXNet.
> Now that many of you have reviewed it, do you think I should start a vote
> on which framework the community wants to move forward with ?
>
> Thanks,
> Meghna
>
> On Mon, Nov 6, 2017 at 6:59 PM, Chris Olivier <cjolivier01@gmail.com>
> wrote:
>
> > After a decision is reached, i am willing to add tasks to Apache MXNet
> JIRA
> >
> > On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy <
> pedro.larroy.lists@gmail.com
> > >
> > wrote:
> >
> > > Thanks for setting up the document guys, looks like a solid basis to
> > > start to work on!
> > >
> > > Marco, Kellen and I have already added some comments.
> > >
> > > Pedro
> > >
> > >
> > > On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal
> > > <meghnabaijal2017@gmail.com> wrote:
> > > > Kellen, Thank you for your comments in the doc.
> > > > Sure Steffen, I will continue to merge everyone’s comments into the
> doc
> > > and
> > > > work with Pedro to finalize it.
> > > > And then we can vote on the options.
> > > >
> > > > Thanks,
> > > > Meghna Baijal
> > > >
> > > >
> > > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel <
> > steffenrochel@gmail.com>
> > > > wrote:
> > > >
> > > >> Sandeep and Meghna have been working in background collecting input
> > and
> > > >> preparing a doc. I suggest to drive discussion forward and would
> like
> > to
> > > >> ask everybody to contribute to
> > > >> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDk
> > > >> dlavUDASzUmLjk/edit?usp=sharing
> > > >>
> > > >> Lets converge on requirements and architecture, so we can move
> forward
> > > with
> > > >> implementation.
> > > >>
> > > >> I would like to suggest for Pedro  and Meghna to lead the discussion
> > and
> > > >> help to resolve suggestions.
> > > >>
> > > >> I assume we need a vote once we are converged on a good draft to
> call
> > > it a
> > > >> plan and move forward with implementation. As we all are unhappy
> with
> > > the
> > > >> current CI situation I would also suggest a phased approach, so we
> can
> > > get
> > > >> back to reliable and efficient basic CI quickly and add advanced
> > > >> capabilities over time.
> > > >>
> > > >> Steffen
> > > >>
> > > >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland <
> > > >> kellen.sunderland@gmail.com> wrote:
> > > >>
> > > >> > Hey Henri, I think that's what a few of us are advocating.
> Running
> > a
> > > set
> > > >> > of quick tests as part of the PR process, and then a more detailed
> > > >> > regression test suite periodically (say every 4 hours). This
fits
> > > nicely
> > > >> > into a tagging or 2 branch development system.  Commits will
be
> > tagged
> > > >> (or
> > > >> > merged into a stable branch) as soon as they pass the detailed
> > > regression
> > > >> > testing.
> > > >> >
> > > >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen <bayard@apache.org>
wrote:
> > > >> >
> > > >> > > Random question - can the CI be split such that the Apache
CI is
> > > doing
> > > >> a
> > > >> > > basic set of checks on that hardware, and is hooked to a
PR,
> while
> > > >> there
> > > >> > is
> > > >> > > a larger "Is trunk good for release?" test that is running
> > > periodically
> > > >> > > rather than on every PR?
> > > >> > >
> > > >> > > ie: do we need each PR to be run on varied hardware, or
can we
> > have
> > > >> this
> > > >> > > two tier approach?
> > > >> > >
> > > >> > > Hen
> > > >> > >
> > > >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > > >> > > sandeep.krishna98@gmail.com> wrote:
> > > >> > >
> > > >> > > > Hello all,
> > > >> > > >
> > > >> > > > I am hereby opening up a discussion thread on how we
can
> > stabilize
> > > >> > Apache
> > > >> > > > MXNet CI build system.
> > > >> > > >
> > > >> > > > Problems:
> > > >> > > >
> > > >> > > > ========
> > > >> > > >
> > > >> > > > Recently, we have seen following issues with Apache
MXNet CI
> > build
> > > >> > > systems:
> > > >> > > >
> > > >> > > >    1. Apache Jenkins master is overloaded and we see
issues
> > like -
> > > >> > unable
> > > >> > > >    to trigger builds, difficult to load and view the
blue
> ocean
> > > and
> > > >> > other
> > > >> > > >    Jenkins build status page.
> > > >> > > >    2. We are generating too many request/interaction
on Apache
> > > Infra
> > > >> > > team.
> > > >> > > >       1. Addition/deletion of new slave: Caused from
scaling
> > > >> activity,
> > > >> > > >       recycling, troubleshooting or any actions leading
to
> > change
> > > of
> > > >> > > slave
> > > >> > > >       machines.
> > > >> > > >       2. Plugins / other Jenkins Master configurations.
> > > >> > > >       3. Experimentation on CI pipelines.
> > > >> > > >    3. Harder to debug and resolve issues - Since access
to
> > master
> > > and
> > > >> > > slave
> > > >> > > >    is not with the same community, it requires Infra
and
> > > community to
> > > >> > > dive
> > > >> > > >    deep together on all action items.
> > > >> > > >
> > > >> > > > Possible Solutions:
> > > >> > > >
> > > >> > > > ==============
> > > >> > > >
> > > >> > > >    1. Can we set up a separate Jenkins CI build system
for
> > Apache
> > > >> MXNet
> > > >> > > >    outside Apache Infra?
> > > >> > > >    2. Can we have a separate Jenkins Master in Apache
Infra
> for
> > > >> MXNet?
> > > >> > > >    3. Review design of current setup, refine and fill
the
> gaps.
> > > >> > > >
> > > >> > > > @ Mentors/Infra team/Community:
> > > >> > > >
> > > >> > > > ==========================
> > > >> > > >
> > > >> > > > Please provide your suggestions on how we can proceed
further
> > and
> > > >> work
> > > >> > on
> > > >> > > > stabilizing the CI build systems for MXNet.
> > > >> > > >
> > > >> > > > Also, if the community decides on separate Jenkins
CI build
> > > system,
> > > >> > what
> > > >> > > > important points should be taken care of apart from
the below:
> > > >> > > >
> > > >> > > >    1. Community being able to access the build page
for build
> > > >> statuses.
> > > >> > > >    2. Committers being able to login with apache credentials.
> > > >> > > >    3. Hook setup from apache/incubator-mxnet repo to
Jenkins
> > > master.
> > > >> > > >
> > > >> > > >
> > > >> > > > Irrespective of the solution we come up, I think we
should
> > > initiate a
> > > >> > > > technical design discussion on how to setup the CI
build
> system.
> > > >> > > Probably 1
> > > >> > > > or 2 pager documents with the architecture and review
with
> Infra
> > > and
> > > >> > > > community members.
> > > >> > > >
> > > >> > > > ***There were few proposal and discussion on the slack
> channel,
> > to
> > > >> > reach
> > > >> > > > wider community members, moving that discussion formally
to
> this
> > > >> list.
> > > >> > > >
> > > >> > > >
> > > >> > > > My Proposal: Option 1 - Set up separate Jenkins CI
build
> system.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > >
> > > >> > > > Sandeep
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > --
> > > >> > > > Sandeep Krishnamurthy
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message