mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Meghna Baijal <meghnabaijal2...@gmail.com>
Subject Re: [Proposal] Stabilizing Apache MXNet CI build system
Date Thu, 09 Nov 2017 23:58:02 GMT
Chris,
The Windows slaves on apache use EIPs which makes it easier to
replace/reboot/reconnect these instances. But, there are some reasons
because of which EIPs cannot be used for ubuntu slaves
Several workarounds are being explored for this. And one such solution is
to use the aws codebuild plugin with Jenkins -

1. In Jenkins there is a plugin to integrate with aws codebuild which can
be used to automate slave management.
2. The idea is to configure only the *ubuntu* slaves using this plugin.
This addresses the issue of EIPs and automation on ubuntu.
3. Other platforms such as windows and Edge devices continue to be
configured directly through jenkins without using this plugin. This is ok
since windows slaves anyway use EIPs

At this point this is only in POC stage.

Thanks,
Meghna Baijal

On Thu, Nov 9, 2017 at 12:23 PM, Meghna Baijal <meghnabaijal2017@gmail.com>
wrote:

> Pedro, I created a row for BuildBot in the doc. Do you want to add some
> pros and cons about it? It would be good to have all this information
> collected in one place.
>
> Meghna
>
> On Thu, Nov 9, 2017 at 4:40 AM, Larroy, Pedro <pllarroy@amazon.de> wrote:
>
>> Thanks a lot for the document and leading the discussion.
>>
>> Does anybody have experience with a build system other than Jenkins? In
>> the document we mention Teamcity as a possible option, and there’s also the
>> second leading open source CI tool “Buildbot” which is not mentioned.
>>
>> I’m not sure if we have strong evidence to have an informed decision
>> about using something other than Jenkins, also from the document I get that
>> the negatives of Jenkins are pretty minor compared to the other frameworks.
>>
>> I would be interested to read if somebody has used any other framework in
>> depth and is willing to vote against using Jenkins so we can all do an
>> informed vote.
>>
>> I don’t feel comfortable voting for Jenkins because is the only one I
>> know as well.
>>
>> Kind regards.
>> --
>>
>> Pedro
>>
>> On 08/11/17 23:41, "Meghna Baijal" <meghnabaijal2017@gmail.com> wrote:
>>
>>     Thanks for the active discussion on the document for the new CI for
>> MXNet.
>>     Now that many of you have reviewed it, do you think I should start a
>> vote
>>     on which framework the community wants to move forward with ?
>>
>>     Thanks,
>>     Meghna
>>
>>     On Mon, Nov 6, 2017 at 6:59 PM, Chris Olivier <cjolivier01@gmail.com>
>> wrote:
>>
>>     > After a decision is reached, i am willing to add tasks to Apache
>> MXNet JIRA
>>     >
>>     > On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy <
>> pedro.larroy.lists@gmail.com
>>     > >
>>     > wrote:
>>     >
>>     > > Thanks for setting up the document guys, looks like a solid basis
>> to
>>     > > start to work on!
>>     > >
>>     > > Marco, Kellen and I have already added some comments.
>>     > >
>>     > > Pedro
>>     > >
>>     > >
>>     > > On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal
>>     > > <meghnabaijal2017@gmail.com> wrote:
>>     > > > Kellen, Thank you for your comments in the doc.
>>     > > > Sure Steffen, I will continue to merge everyone’s comments into
>> the doc
>>     > > and
>>     > > > work with Pedro to finalize it.
>>     > > > And then we can vote on the options.
>>     > > >
>>     > > > Thanks,
>>     > > > Meghna Baijal
>>     > > >
>>     > > >
>>     > > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel <
>>     > steffenrochel@gmail.com>
>>     > > > wrote:
>>     > > >
>>     > > >> Sandeep and Meghna have been working in background collecting
>> input
>>     > and
>>     > > >> preparing a doc. I suggest to drive discussion forward and
>> would like
>>     > to
>>     > > >> ask everybody to contribute to
>>     > > >> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZM
>> awxDk
>>     > > >> dlavUDASzUmLjk/edit?usp=sharing
>>     > > >>
>>     > > >> Lets converge on requirements and architecture, so we can
move
>> forward
>>     > > with
>>     > > >> implementation.
>>     > > >>
>>     > > >> I would like to suggest for Pedro  and Meghna to lead the
>> discussion
>>     > and
>>     > > >> help to resolve suggestions.
>>     > > >>
>>     > > >> I assume we need a vote once we are converged on a good draft
>> to call
>>     > > it a
>>     > > >> plan and move forward with implementation. As we all are
>> unhappy with
>>     > > the
>>     > > >> current CI situation I would also suggest a phased approach,
>> so we can
>>     > > get
>>     > > >> back to reliable and efficient basic CI quickly and add
>> advanced
>>     > > >> capabilities over time.
>>     > > >>
>>     > > >> Steffen
>>     > > >>
>>     > > >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland <
>>     > > >> kellen.sunderland@gmail.com> wrote:
>>     > > >>
>>     > > >> > Hey Henri, I think that's what a few of us are advocating.
>> Running
>>     > a
>>     > > set
>>     > > >> > of quick tests as part of the PR process, and then a
more
>> detailed
>>     > > >> > regression test suite periodically (say every 4 hours).
This
>> fits
>>     > > nicely
>>     > > >> > into a tagging or 2 branch development system.  Commits
will
>> be
>>     > tagged
>>     > > >> (or
>>     > > >> > merged into a stable branch) as soon as they pass the
>> detailed
>>     > > regression
>>     > > >> > testing.
>>     > > >> >
>>     > > >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen <bayard@apache.org>
>> wrote:
>>     > > >> >
>>     > > >> > > Random question - can the CI be split such that
the Apache
>> CI is
>>     > > doing
>>     > > >> a
>>     > > >> > > basic set of checks on that hardware, and is hooked
to a
>> PR, while
>>     > > >> there
>>     > > >> > is
>>     > > >> > > a larger "Is trunk good for release?" test that
is running
>>     > > periodically
>>     > > >> > > rather than on every PR?
>>     > > >> > >
>>     > > >> > > ie: do we need each PR to be run on varied hardware,
or
>> can we
>>     > have
>>     > > >> this
>>     > > >> > > two tier approach?
>>     > > >> > >
>>     > > >> > > Hen
>>     > > >> > >
>>     > > >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy
<
>>     > > >> > > sandeep.krishna98@gmail.com> wrote:
>>     > > >> > >
>>     > > >> > > > Hello all,
>>     > > >> > > >
>>     > > >> > > > I am hereby opening up a discussion thread
on how we can
>>     > stabilize
>>     > > >> > Apache
>>     > > >> > > > MXNet CI build system.
>>     > > >> > > >
>>     > > >> > > > Problems:
>>     > > >> > > >
>>     > > >> > > > ========
>>     > > >> > > >
>>     > > >> > > > Recently, we have seen following issues with
Apache
>> MXNet CI
>>     > build
>>     > > >> > > systems:
>>     > > >> > > >
>>     > > >> > > >    1. Apache Jenkins master is overloaded and
we see
>> issues
>>     > like -
>>     > > >> > unable
>>     > > >> > > >    to trigger builds, difficult to load and
view the
>> blue ocean
>>     > > and
>>     > > >> > other
>>     > > >> > > >    Jenkins build status page.
>>     > > >> > > >    2. We are generating too many request/interaction
on
>> Apache
>>     > > Infra
>>     > > >> > > team.
>>     > > >> > > >       1. Addition/deletion of new slave: Caused
from
>> scaling
>>     > > >> activity,
>>     > > >> > > >       recycling, troubleshooting or any actions
leading
>> to
>>     > change
>>     > > of
>>     > > >> > > slave
>>     > > >> > > >       machines.
>>     > > >> > > >       2. Plugins / other Jenkins Master configurations.
>>     > > >> > > >       3. Experimentation on CI pipelines.
>>     > > >> > > >    3. Harder to debug and resolve issues -
Since access
>> to
>>     > master
>>     > > and
>>     > > >> > > slave
>>     > > >> > > >    is not with the same community, it requires
Infra and
>>     > > community to
>>     > > >> > > dive
>>     > > >> > > >    deep together on all action items.
>>     > > >> > > >
>>     > > >> > > > Possible Solutions:
>>     > > >> > > >
>>     > > >> > > > ==============
>>     > > >> > > >
>>     > > >> > > >    1. Can we set up a separate Jenkins CI build
system
>> for
>>     > Apache
>>     > > >> MXNet
>>     > > >> > > >    outside Apache Infra?
>>     > > >> > > >    2. Can we have a separate Jenkins Master
in Apache
>> Infra for
>>     > > >> MXNet?
>>     > > >> > > >    3. Review design of current setup, refine
and fill
>> the gaps.
>>     > > >> > > >
>>     > > >> > > > @ Mentors/Infra team/Community:
>>     > > >> > > >
>>     > > >> > > > ==========================
>>     > > >> > > >
>>     > > >> > > > Please provide your suggestions on how we can
proceed
>> further
>>     > and
>>     > > >> work
>>     > > >> > on
>>     > > >> > > > stabilizing the CI build systems for MXNet.
>>     > > >> > > >
>>     > > >> > > > Also, if the community decides on separate
Jenkins CI
>> build
>>     > > system,
>>     > > >> > what
>>     > > >> > > > important points should be taken care of apart
from the
>> below:
>>     > > >> > > >
>>     > > >> > > >    1. Community being able to access the build
page for
>> build
>>     > > >> statuses.
>>     > > >> > > >    2. Committers being able to login with apache
>> credentials.
>>     > > >> > > >    3. Hook setup from apache/incubator-mxnet
repo to
>> Jenkins
>>     > > master.
>>     > > >> > > >
>>     > > >> > > >
>>     > > >> > > > Irrespective of the solution we come up, I
think we
>> should
>>     > > initiate a
>>     > > >> > > > technical design discussion on how to setup
the CI build
>> system.
>>     > > >> > > Probably 1
>>     > > >> > > > or 2 pager documents with the architecture
and review
>> with Infra
>>     > > and
>>     > > >> > > > community members.
>>     > > >> > > >
>>     > > >> > > > ***There were few proposal and discussion on
the slack
>> channel,
>>     > to
>>     > > >> > reach
>>     > > >> > > > wider community members, moving that discussion
formally
>> to this
>>     > > >> list.
>>     > > >> > > >
>>     > > >> > > >
>>     > > >> > > > My Proposal: Option 1 - Set up separate Jenkins
CI build
>> system.
>>     > > >> > > >
>>     > > >> > > > Thanks,
>>     > > >> > > >
>>     > > >> > > > Sandeep
>>     > > >> > > >
>>     > > >> > > >
>>     > > >> > > >
>>     > > >> > > > --
>>     > > >> > > > Sandeep Krishnamurthy
>>     > > >> > > >
>>     > > >> > >
>>     > > >> >
>>     > > >>
>>     > >
>>     >
>>
>>
>> Amazon Development Center Germany GmbH
>> Berlin - Dresden - Aachen
>> main office: Krausenstr. 38, 10117 Berlin
>> Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
>> Ust-ID: DE289237879
>> Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message