mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marco de Abreu <marco.g.ab...@gmail.com>
Subject Re: [DISCUSS] CI Access Control
Date Wed, 23 Oct 2019 23:18:36 GMT
Added all three of you. You might have to relog.

Przemysław Trędak <ptrendx@apache.org> schrieb am Do., 24. Okt. 2019, 01:15:

> Oooh, that is why we did not see this button... Could you add me and Dick?
>
> Thank you!
> Przemek
>
> On 2019/10/23 23:11:32, Marco de Abreu <marco.g.abreu@gmail.com> wrote:
> > We can't use the role feature of GitHub, thus committers have to be added
> > manually by an admin.
> >
> > Lausen, Leonard <lausen@amazon.de.invalid> schrieb am Do., 24. Okt.
> 2019,
> > 00:55:
> >
> > > Hi Marco,
> > >
> > > do you mean retriggering PRs should be possible for all members of
> > > https://github.com/orgs/apache/teams/mxnet-committers/members team? It
> > > doesn't
> > > work for me unfortunately (even though I login to the CI via my Github
> > > account).
> > > The retrigger button simply doesn't show up.
> > >
> > > Are any further steps required?
> > >
> > > Best regards
> > > Leonard
> > >
> > > On Tue, 2019-09-17 at 14:47 +0200, Marco de Abreu wrote:
> > > > Hi Sheng,
> > > >
> > > > will I'm in general all in favour of widening the access to
> distribute
> > > the
> > > > tasks, the situation around the CI system in particular is a bit more
> > > > difficult.
> > > >
> > > > As far as I know, the creation of the CI system is neither automated,
> > > > versioned nor backed up or safeguarded. This means that if somebody
> > > makes a
> > > > change that breaks something, we're left with a broken system we
> can't
> > > > recover from. Thus, I preferred it in the past to restrict the
> access as
> > > > much as possible (at least to Prod) to avoid these situations from
> > > > happening. While #1 and #2 are already possible today (we have two
> roles
> > > > for committers and regular users that allow this already), #3 and #4
> come
> > > > with a significant risk for the stability of the system.
> > > >
> > > > As soon as a job is added or changed, a lot of things happen in
> Jenkins -
> > > > one of these tasks is the SCM scan which tries to determine the
> branches
> > > > the job should run on. For somebody who is inexperienced, the first
> > > pitfall
> > > > is that suddenly hundreds of jobs are being spawned which will
> certainly
> > > > overload Jenkins and render it unusable. There are a lot of tricks
> and I
> > > > could elaborate them, but basically the bottom line is that the
> > > > configuration interface of Jenkins is far from fail-proof and
> exposes a
> > > > significant risk if accessed by somebody who doesn't exactly know
> what
> > > > they're doing - speak, we would need to design some kind of training
> and
> > > > even that would not safeguard us from these fatal events.
> > > >
> > > > There's the whole security aspect around user-facing artifact
> generation
> > > of
> > > > CI/CD and the possibility of them being tampered, but I don't think I
> > > have
> > > > to elaborate that.
> > > >
> > > > With regards to #4 especially, I'd say that the risk of somebody just
> > > > upgrading the system or changing plugins inherits an even bigger
> risk.
> > > > Plugins are notoriously unsafe and system updates have also shown to
> not
> > > > really go like a breeze. I'd argue that changes to the system should
> only
> > > > be done by the administrators of it since they have a bigger overview
> > > over
> > > > all the things that are currently going on while also having the full
> > > > access (backups before making changes, SSH access, log access, metric
> > > > access, etc) to debug errors. In the end we shouldn't forget that
> this
> > > is a
> > > > productive system - usually, you'd have nobody being able to touch
> it at
> > > > all, but we're not in a perfect world, so I'd say we should restrict
> it
> > > to
> > > > a bare minimum in the form of admins.
> > > >
> > > > So while I certainly understand and encourage to distribute the
> access, I
> > > > don't feel comfortable widening the access to such a critical
> productive
> > > > system. It being down means that the GitHub development is fully
> halted,
> > > > which is really problematic since we don't have rollback mechanisms.
> > > >
> > > > Best regards,
> > > > marco
> > > >
> > > > On Sun, Sep 15, 2019 at 6:40 AM Sheng Zha <zhasheng@apache.org>
> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'd like to initiate discussion on how access control should be
> managed
> > > > > for the CI system. The hope is that we can present the conclusion
> of
> > > this
> > > > > discussion as the recommendation and request to the donors of the
> CI
> > > system
> > > > > from Amazon.
> > > > >
> > > > > The specific aspects I'd like to discuss are the abilities to:
> > > > > 1. trigger PR validation and nightly jobs.
> > > > > 2. trigger continuous delivery jobs, such as for binary releases
in
> > > pip,
> > > > > maven, and dockerhub.
> > > > > 3. add jobs to the CI system.
> > > > > 4. maintain and manage the CI system, such as system upgrades and
> > > jenkins
> > > > > plugin installation.
> > > > >
> > > > > Given that we already have GitHub SSO enabled on the Jenkins CI,
I
> > > suggest
> > > > > the following authentication levels for these items:
> > > > > 1. all authenticated GitHub users.
> > > > > 2-4. all MXNet committers
> > > > >
> > > > > What do you think? If you have more aspects that you wish to
> discuss,
> > > feel
> > > > > free to propose.
> > > > >
> > > > > -sz
> > > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message