mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian <ssc.o...@googlemail.com>
Subject Re: [Proposal] Stabilizing Apache MXNet CI build system
Date Thu, 26 Oct 2017 06:05:00 GMT
@meghana @pedro let me know if you need someone with a mentor hat to 
open tickets or send mail to infra, happy to help here.

Best,
Sebastian

On 25.10.2017 23:18, sandeep krishnamurthy wrote:
> Thank you, everyone, for the discussion, proposal, and the vote.
> 
> Here majority community members see current CI system for Apache MXNet is
> having issues in scaling and diverse test environments. And the common
> suggestion is to have a separate CI setup for Apache MXNet.
> 
> Following are the next steps:
> 
> 1. Meghana proposed she would like to take the lead on this and come up
> with an initial tech design write up covering requirements, use-cases,
> alternate solutions and a proposed solution on how we could set up the CI
> system for MXNet.
> 2. This tech design will be reviewed in the community and following that,
> collaborate with Infra team and mentors to complete setup in the
> integration of the new system with Repo and Website and more.
> 
> @Pedro Larry - We should sync up on understanding how we can unify the set
> up you have for various devices and the new set up being proposed and
> built. Ideally, we should have a unified CI setup for the project
> accessible to the community.
> 
> Regards,
> Sandeep
> 
> On Mon, Oct 23, 2017 at 7:29 AM, Pedro Larroy <pedro.larroy.lists@gmail.com>
> wrote:
> 
>> +1
>>
>> We (with Kellen and Marco) are already working on a CI system that verifies
>> MXNet on devices, so far a work in progress, but at least we are checking
>> that the build is sane on Android, different arm flavors and ubuntu, also
>> building PRs. So far we are still working on having the unit tests pass on
>> some architectures like Jetson TX2 and ARM / Raspberry PI.
>>
>> http://ci.mxnet.amazon-ml.com/
>>
>> Agree with Steffen on creating a document with requirements and high level
>> architecture. Also I would like to have quicker feedback and as we
>> discussed before, saner unit tests. I think there's a big and nontrivial
>> amount of effort required here.
>>
>> Pedro.
>>
>> On Mon, Oct 23, 2017 at 6:43 AM, Steffen Rochel <steffenrochel@gmail.com>
>> wrote:
>>
>>> +1
>>> I support Option 1 - Set up separate Jenkins CI build system. While the
>>> Apache service is appropriate for some projects, our experience over the
>>> last 6 months has not been meeting the needs of the MXNet (incubating)
>>> project. AWS has been and will continue provide resources for such
>> project.
>>> Agree we should create a document summarizing the requirements and high
>>> level architecture, which should answer the question of Jenkins or
>>> alternative.
>>>
>>> Steffen
>>>
>>> On Sat, Oct 21, 2017 at 6:51 PM shiwen hu <yajiedesign@gmail.com> wrote:
>>>
>>>> +1
>>>>
>>>>
>>>> 2017-10-21 9:48 GMT+08:00 Chris Olivier <cjolivier01@gmail.com>:
>>>>
>>>>> Ok, just looking for anything that can cut a task out if possible. I
>> do
>>>>> support not using Apache Jenkins server anyMore — it’s really not
>> been
>>>>> working out for various reasons.  But having a person full time is
>>>>> something that Steffen would have to address, I imagine.
>>>>>
>>>>> On Fri, Oct 20, 2017 at 6:03 PM Mu Li <muli.cmu@gmail.com> wrote:
>>>>>
>>>>>> I didn't see the clear advantage of CodePipline over pure jenkins,
>>>>> because
>>>>>> we don't need to deploy here.
>>>>>>
>>>>>> On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier <
>>> cjolivier01@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> CodePipeline, then.  You can point it to Jenkins instances.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 20, 2017 at 4:49 PM Mu Li <muli.cmu@gmail.com>
>> wrote:
>>>>>>>
>>>>>>>> AWS CodeBuild is not an option. It doesn't support GPU
>> instances,
>>>> mac
>>>>>> os
>>>>>>> x,
>>>>>>>> and windows. Not even mention the edge devices.
>>>>>>>>
>>>>>>>> On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier <
>>>>> cjolivier01@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Why don;t we look into fully managed AWS CodeBuild? 
It
>>> maintains
>>>>>>>>> everything. It's also compatible with Jenkins.
>>>>>>>>>
>>>>>>>>> On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen <
>>>>>> tqchen@cs.washington.edu
>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>> Tianqi
>>>>>>>>>> On Fri, Oct 20, 2017 at 1:39 PM Mu Li <muli.cmu@gmail.com>
>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It seems that the Apache CI is quite overloaded
these
>> days,
>>>> and
>>>>>>>> MXNet's
>>>>>>>>>> CI
>>>>>>>>>>> pipeline is too complex to run there. In addition,
we may
>>>> need
>>>>> to
>>>>>>> add
>>>>>>>>>> more
>>>>>>>>>>> devices, e.g. macpro and rasbperry pi, into the
server,
>> and
>>>>> more
>>>>>>>> tasks
>>>>>>>>>> such
>>>>>>>>>>> as pip build. It means a lot of requests to the
Infra
>> team.
>>>>>>>>>>>
>>>>>>>>>>> We can reuse our previous Jenkins server at
>>>>> http://ci.mxnet.io/.
>>>>>>> But
>>>>>>>>> we
>>>>>>>>>>> probably need a dedicate developer to maintain
it.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy
<
>>>>>>>>>>> sandeep.krishna98@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>
>>>>>>>>>>>> I am hereby opening up a discussion thread
on how we
>> can
>>>>>>> stabilize
>>>>>>>>>> Apache
>>>>>>>>>>>> MXNet CI build system.
>>>>>>>>>>>>
>>>>>>>>>>>> Problems:
>>>>>>>>>>>>
>>>>>>>>>>>> ========
>>>>>>>>>>>>
>>>>>>>>>>>> Recently, we have seen following issues with
Apache
>> MXNet
>>>> CI
>>>>>>> build
>>>>>>>>>>> systems:
>>>>>>>>>>>>
>>>>>>>>>>>>     1. Apache Jenkins master is overloaded
and we see
>>> issues
>>>>>> like
>>>>>>> -
>>>>>>>>>> unable
>>>>>>>>>>>>     to trigger builds, difficult to load
and view the
>> blue
>>>>> ocean
>>>>>>> and
>>>>>>>>>> other
>>>>>>>>>>>>     Jenkins build status page.
>>>>>>>>>>>>     2. We are generating too many request/interaction
on
>>>>> Apache
>>>>>>>> Infra
>>>>>>>>>>> team.
>>>>>>>>>>>>        1. Addition/deletion of new slave:
Caused from
>>>> scaling
>>>>>>>>> activity,
>>>>>>>>>>>>        recycling, troubleshooting or any
actions leading
>>> to
>>>>>> change
>>>>>>>> of
>>>>>>>>>>> slave
>>>>>>>>>>>>        machines.
>>>>>>>>>>>>        2. Plugins / other Jenkins Master
configurations.
>>>>>>>>>>>>        3. Experimentation on CI pipelines.
>>>>>>>>>>>>     3. Harder to debug and resolve issues
- Since access
>>> to
>>>>>> master
>>>>>>>> and
>>>>>>>>>>> slave
>>>>>>>>>>>>     is not with the same community, it requires
Infra
>> and
>>>>>>> community
>>>>>>>> to
>>>>>>>>>>> dive
>>>>>>>>>>>>     deep together on all action items.
>>>>>>>>>>>>
>>>>>>>>>>>> Possible Solutions:
>>>>>>>>>>>>
>>>>>>>>>>>> ==============
>>>>>>>>>>>>
>>>>>>>>>>>>     1. Can we set up a separate Jenkins CI
build system
>>> for
>>>>>> Apache
>>>>>>>>> MXNet
>>>>>>>>>>>>     outside Apache Infra?
>>>>>>>>>>>>     2. Can we have a separate Jenkins Master
in Apache
>>> Infra
>>>>> for
>>>>>>>>> MXNet?
>>>>>>>>>>>>     3. Review design of current setup, refine
and fill
>> the
>>>>> gaps.
>>>>>>>>>>>>
>>>>>>>>>>>> @ Mentors/Infra team/Community:
>>>>>>>>>>>>
>>>>>>>>>>>> ==========================
>>>>>>>>>>>>
>>>>>>>>>>>> Please provide your suggestions on how we
can proceed
>>>> further
>>>>>> and
>>>>>>>>> work
>>>>>>>>>> on
>>>>>>>>>>>> stabilizing the CI build systems for MXNet.
>>>>>>>>>>>>
>>>>>>>>>>>> Also, if the community decides on separate
Jenkins CI
>>> build
>>>>>>> system,
>>>>>>>>>> what
>>>>>>>>>>>> important points should be taken care of
apart from the
>>>>> below:
>>>>>>>>>>>>
>>>>>>>>>>>>     1. Community being able to access the
build page for
>>>> build
>>>>>>>>> statuses.
>>>>>>>>>>>>     2. Committers being able to login with
apache
>>>> credentials.
>>>>>>>>>>>>     3. Hook setup from apache/incubator-mxnet
repo to
>>>> Jenkins
>>>>>>>> master.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Irrespective of the solution we come up,
I think we
>>> should
>>>>>>>> initiate a
>>>>>>>>>>>> technical design discussion on how to setup
the CI
>> build
>>>>>> system.
>>>>>>>>>>> Probably 1
>>>>>>>>>>>> or 2 pager documents with the architecture
and review
>>> with
>>>>>> Infra
>>>>>>>> and
>>>>>>>>>>>> community members.
>>>>>>>>>>>>
>>>>>>>>>>>> ***There were few proposal and discussion
on the slack
>>>>> channel,
>>>>>>> to
>>>>>>>>>> reach
>>>>>>>>>>>> wider community members, moving that discussion
>> formally
>>> to
>>>>>> this
>>>>>>>>> list.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> My Proposal: Option 1 - Set up separate Jenkins
CI
>> build
>>>>>> system.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Sandeep
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Sandeep Krishnamurthy
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 
> 
> 

Mime
View raw message