mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Meghna Baijal <meghnabaijal2...@gmail.com>
Subject Re: [Proposal] Stabilizing Apache MXNet CI build system
Date Thu, 26 Oct 2017 06:14:18 GMT
Thanks Sandeep for driving this discussion. I am also in contact with Pedro
and his team to include their requirements.
And thank you Sebastian, I will let you know!

Meghna

On Wed, Oct 25, 2017 at 11:05 PM, Sebastian <ssc.open@googlemail.com> wrote:

> @meghana @pedro let me know if you need someone with a mentor hat to open
> tickets or send mail to infra, happy to help here.
>
> Best,
> Sebastian
>
>
> On 25.10.2017 23:18, sandeep krishnamurthy wrote:
>
>> Thank you, everyone, for the discussion, proposal, and the vote.
>>
>> Here majority community members see current CI system for Apache MXNet is
>> having issues in scaling and diverse test environments. And the common
>> suggestion is to have a separate CI setup for Apache MXNet.
>>
>> Following are the next steps:
>>
>> 1. Meghana proposed she would like to take the lead on this and come up
>> with an initial tech design write up covering requirements, use-cases,
>> alternate solutions and a proposed solution on how we could set up the CI
>> system for MXNet.
>> 2. This tech design will be reviewed in the community and following that,
>> collaborate with Infra team and mentors to complete setup in the
>> integration of the new system with Repo and Website and more.
>>
>> @Pedro Larry - We should sync up on understanding how we can unify the set
>> up you have for various devices and the new set up being proposed and
>> built. Ideally, we should have a unified CI setup for the project
>> accessible to the community.
>>
>> Regards,
>> Sandeep
>>
>> On Mon, Oct 23, 2017 at 7:29 AM, Pedro Larroy <
>> pedro.larroy.lists@gmail.com>
>> wrote:
>>
>> +1
>>>
>>> We (with Kellen and Marco) are already working on a CI system that
>>> verifies
>>> MXNet on devices, so far a work in progress, but at least we are checking
>>> that the build is sane on Android, different arm flavors and ubuntu, also
>>> building PRs. So far we are still working on having the unit tests pass
>>> on
>>> some architectures like Jetson TX2 and ARM / Raspberry PI.
>>>
>>> http://ci.mxnet.amazon-ml.com/
>>>
>>> Agree with Steffen on creating a document with requirements and high
>>> level
>>> architecture. Also I would like to have quicker feedback and as we
>>> discussed before, saner unit tests. I think there's a big and nontrivial
>>> amount of effort required here.
>>>
>>> Pedro.
>>>
>>> On Mon, Oct 23, 2017 at 6:43 AM, Steffen Rochel <steffenrochel@gmail.com
>>> >
>>> wrote:
>>>
>>> +1
>>>> I support Option 1 - Set up separate Jenkins CI build system. While the
>>>> Apache service is appropriate for some projects, our experience over the
>>>> last 6 months has not been meeting the needs of the MXNet (incubating)
>>>> project. AWS has been and will continue provide resources for such
>>>>
>>> project.
>>>
>>>> Agree we should create a document summarizing the requirements and high
>>>> level architecture, which should answer the question of Jenkins or
>>>> alternative.
>>>>
>>>> Steffen
>>>>
>>>> On Sat, Oct 21, 2017 at 6:51 PM shiwen hu <yajiedesign@gmail.com>
>>>> wrote:
>>>>
>>>> +1
>>>>>
>>>>>
>>>>> 2017-10-21 9:48 GMT+08:00 Chris Olivier <cjolivier01@gmail.com>:
>>>>>
>>>>> Ok, just looking for anything that can cut a task out if possible. I
>>>>>>
>>>>> do
>>>
>>>> support not using Apache Jenkins server anyMore — it’s really not
>>>>>>
>>>>> been
>>>
>>>> working out for various reasons.  But having a person full time is
>>>>>> something that Steffen would have to address, I imagine.
>>>>>>
>>>>>> On Fri, Oct 20, 2017 at 6:03 PM Mu Li <muli.cmu@gmail.com>
wrote:
>>>>>>
>>>>>> I didn't see the clear advantage of CodePipline over pure jenkins,
>>>>>>>
>>>>>> because
>>>>>>
>>>>>>> we don't need to deploy here.
>>>>>>>
>>>>>>> On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier <
>>>>>>>
>>>>>> cjolivier01@gmail.com>
>>>>
>>>>> wrote:
>>>>>>>
>>>>>>> CodePipeline, then.  You can point it to Jenkins instances.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 20, 2017 at 4:49 PM Mu Li <muli.cmu@gmail.com>
>>>>>>>>
>>>>>>> wrote:
>>>
>>>>
>>>>>>>> AWS CodeBuild is not an option. It doesn't support GPU
>>>>>>>>>
>>>>>>>> instances,
>>>
>>>> mac
>>>>>
>>>>>> os
>>>>>>>
>>>>>>>> x,
>>>>>>>>
>>>>>>>>> and windows. Not even mention the edge devices.
>>>>>>>>>
>>>>>>>>> On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier <
>>>>>>>>>
>>>>>>>> cjolivier01@gmail.com>
>>>>>>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Why don;t we look into fully managed AWS CodeBuild? 
It
>>>>>>>>>>
>>>>>>>>> maintains
>>>>
>>>>> everything. It's also compatible with Jenkins.
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen <
>>>>>>>>>>
>>>>>>>>> tqchen@cs.washington.edu
>>>>>>>
>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> +1
>>>>>>>>>>>
>>>>>>>>>>> Tianqi
>>>>>>>>>>> On Fri, Oct 20, 2017 at 1:39 PM Mu Li <muli.cmu@gmail.com>
>>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>>>>> +1
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> It seems that the Apache CI is quite overloaded
these
>>>>>>>>>>>>
>>>>>>>>>>> days,
>>>
>>>> and
>>>>>
>>>>>> MXNet's
>>>>>>>>>
>>>>>>>>>> CI
>>>>>>>>>>>
>>>>>>>>>>>> pipeline is too complex to run there. In
addition, we may
>>>>>>>>>>>>
>>>>>>>>>>> need
>>>>>
>>>>>> to
>>>>>>
>>>>>>> add
>>>>>>>>
>>>>>>>>> more
>>>>>>>>>>>
>>>>>>>>>>>> devices, e.g. macpro and rasbperry pi, into
the server,
>>>>>>>>>>>>
>>>>>>>>>>> and
>>>
>>>> more
>>>>>>
>>>>>>> tasks
>>>>>>>>>
>>>>>>>>>> such
>>>>>>>>>>>
>>>>>>>>>>>> as pip build. It means a lot of requests
to the Infra
>>>>>>>>>>>>
>>>>>>>>>>> team.
>>>
>>>>
>>>>>>>>>>>> We can reuse our previous Jenkins server
at
>>>>>>>>>>>>
>>>>>>>>>>> http://ci.mxnet.io/.
>>>>>>
>>>>>>> But
>>>>>>>>
>>>>>>>>> we
>>>>>>>>>>
>>>>>>>>>>> probably need a dedicate developer to maintain
it.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Oct 20, 2017 at 1:01 PM, sandeep
krishnamurthy <
>>>>>>>>>>>> sandeep.krishna98@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am hereby opening up a discussion thread
on how we
>>>>>>>>>>>>>
>>>>>>>>>>>> can
>>>
>>>> stabilize
>>>>>>>>
>>>>>>>>> Apache
>>>>>>>>>>>
>>>>>>>>>>>> MXNet CI build system.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Problems:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ========
>>>>>>>>>>>>>
>>>>>>>>>>>>> Recently, we have seen following issues
with Apache
>>>>>>>>>>>>>
>>>>>>>>>>>> MXNet
>>>
>>>> CI
>>>>>
>>>>>> build
>>>>>>>>
>>>>>>>>> systems:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>     1. Apache Jenkins master is overloaded
and we see
>>>>>>>>>>>>>
>>>>>>>>>>>> issues
>>>>
>>>>> like
>>>>>>>
>>>>>>>> -
>>>>>>>>
>>>>>>>>> unable
>>>>>>>>>>>
>>>>>>>>>>>>     to trigger builds, difficult to load
and view the
>>>>>>>>>>>>>
>>>>>>>>>>>> blue
>>>
>>>> ocean
>>>>>>
>>>>>>> and
>>>>>>>>
>>>>>>>>> other
>>>>>>>>>>>
>>>>>>>>>>>>     Jenkins build status page.
>>>>>>>>>>>>>     2. We are generating too many request/interaction
on
>>>>>>>>>>>>>
>>>>>>>>>>>> Apache
>>>>>>
>>>>>>> Infra
>>>>>>>>>
>>>>>>>>>> team.
>>>>>>>>>>>>
>>>>>>>>>>>>>        1. Addition/deletion of new slave:
Caused from
>>>>>>>>>>>>>
>>>>>>>>>>>> scaling
>>>>>
>>>>>> activity,
>>>>>>>>>>
>>>>>>>>>>>        recycling, troubleshooting or any actions
leading
>>>>>>>>>>>>>
>>>>>>>>>>>> to
>>>>
>>>>> change
>>>>>>>
>>>>>>>> of
>>>>>>>>>
>>>>>>>>>> slave
>>>>>>>>>>>>
>>>>>>>>>>>>>        machines.
>>>>>>>>>>>>>        2. Plugins / other Jenkins Master
configurations.
>>>>>>>>>>>>>        3. Experimentation on CI pipelines.
>>>>>>>>>>>>>     3. Harder to debug and resolve issues
- Since access
>>>>>>>>>>>>>
>>>>>>>>>>>> to
>>>>
>>>>> master
>>>>>>>
>>>>>>>> and
>>>>>>>>>
>>>>>>>>>> slave
>>>>>>>>>>>>
>>>>>>>>>>>>>     is not with the same community, it
requires Infra
>>>>>>>>>>>>>
>>>>>>>>>>>> and
>>>
>>>> community
>>>>>>>>
>>>>>>>>> to
>>>>>>>>>
>>>>>>>>>> dive
>>>>>>>>>>>>
>>>>>>>>>>>>>     deep together on all action items.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Possible Solutions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ==============
>>>>>>>>>>>>>
>>>>>>>>>>>>>     1. Can we set up a separate Jenkins
CI build system
>>>>>>>>>>>>>
>>>>>>>>>>>> for
>>>>
>>>>> Apache
>>>>>>>
>>>>>>>> MXNet
>>>>>>>>>>
>>>>>>>>>>>     outside Apache Infra?
>>>>>>>>>>>>>     2. Can we have a separate Jenkins
Master in Apache
>>>>>>>>>>>>>
>>>>>>>>>>>> Infra
>>>>
>>>>> for
>>>>>>
>>>>>>> MXNet?
>>>>>>>>>>
>>>>>>>>>>>     3. Review design of current setup, refine
and fill
>>>>>>>>>>>>>
>>>>>>>>>>>> the
>>>
>>>> gaps.
>>>>>>
>>>>>>>
>>>>>>>>>>>>> @ Mentors/Infra team/Community:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ==========================
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please provide your suggestions on how
we can proceed
>>>>>>>>>>>>>
>>>>>>>>>>>> further
>>>>>
>>>>>> and
>>>>>>>
>>>>>>>> work
>>>>>>>>>>
>>>>>>>>>>> on
>>>>>>>>>>>
>>>>>>>>>>>> stabilizing the CI build systems for MXNet.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also, if the community decides on separate
Jenkins CI
>>>>>>>>>>>>>
>>>>>>>>>>>> build
>>>>
>>>>> system,
>>>>>>>>
>>>>>>>>> what
>>>>>>>>>>>
>>>>>>>>>>>> important points should be taken care of
apart from the
>>>>>>>>>>>>>
>>>>>>>>>>>> below:
>>>>>>
>>>>>>>
>>>>>>>>>>>>>     1. Community being able to access
the build page for
>>>>>>>>>>>>>
>>>>>>>>>>>> build
>>>>>
>>>>>> statuses.
>>>>>>>>>>
>>>>>>>>>>>     2. Committers being able to login with apache
>>>>>>>>>>>>>
>>>>>>>>>>>> credentials.
>>>>>
>>>>>>     3. Hook setup from apache/incubator-mxnet repo to
>>>>>>>>>>>>>
>>>>>>>>>>>> Jenkins
>>>>>
>>>>>> master.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Irrespective of the solution we come
up, I think we
>>>>>>>>>>>>>
>>>>>>>>>>>> should
>>>>
>>>>> initiate a
>>>>>>>>>
>>>>>>>>>> technical design discussion on how to setup the CI
>>>>>>>>>>>>>
>>>>>>>>>>>> build
>>>
>>>> system.
>>>>>>>
>>>>>>>> Probably 1
>>>>>>>>>>>>
>>>>>>>>>>>>> or 2 pager documents with the architecture
and review
>>>>>>>>>>>>>
>>>>>>>>>>>> with
>>>>
>>>>> Infra
>>>>>>>
>>>>>>>> and
>>>>>>>>>
>>>>>>>>>> community members.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ***There were few proposal and discussion
on the slack
>>>>>>>>>>>>>
>>>>>>>>>>>> channel,
>>>>>>
>>>>>>> to
>>>>>>>>
>>>>>>>>> reach
>>>>>>>>>>>
>>>>>>>>>>>> wider community members, moving that discussion
>>>>>>>>>>>>>
>>>>>>>>>>>> formally
>>>
>>>> to
>>>>
>>>>> this
>>>>>>>
>>>>>>>> list.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> My Proposal: Option 1 - Set up separate
Jenkins CI
>>>>>>>>>>>>>
>>>>>>>>>>>> build
>>>
>>>> system.
>>>>>>>
>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sandeep
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Sandeep Krishnamurthy
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message