mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kellen sunderland <kellen.sunderl...@gmail.com>
Subject Re: Adding AMD CPU to CI
Date Fri, 30 Nov 2018 17:58:22 GMT
+1 to nightly.

Given the awesome results shown by Alex for AMD cpus I think MKLDNN
actually would probably be something I'd use, even on my AMD machines.
Kudos to Intel for releasing this lib which works great on their hardware,
but still pretty well w/ AMD.  The upshot of MKLDNN supporting AMD to me is
that it makes me much more likely to support it as the default PyPi package
(discussed in another thread).  This is part of the reason I'd like to have
a sanity test in CI somewhere for AMD hardware.

Unrelated note: regarding global warming I actually partially chose
eu-west-1 to host CI because it's carbon neutral.  The cost of the CI is
significant, and although it's donated by AWS I'm glad the community is
cognizant of that.

On Fri, Nov 30, 2018 at 9:54 AM Kumar, Vikas <vikumar@amazon.com.invalid>
wrote:

> I concur. +1 for nightly for pre-release suit.
>
> ´╗┐On 11/30/18, 9:49 AM, "Tianqi Chen" <tqchen@cs.washington.edu> wrote:
>
>     +1 for nightly for pre-release suit, but not the CI that triggered in
> every
>     test.  The best engineering practice is not to add things, but to
> remove
>     things so that there is nothing can be removed.
>
>     In terms of MLDNN, since it is an Intel product, I doubt optimizing
> for AMD
>     CPUs is its goal, adding CI to guard against backward compatibility is
> a
>     bit overkill even. Since the AMD CPU user would likely disable this
> feature
>     and use the original CPU version of the project.
>
>     At least we can contribute to reducing the carbon footprint and slows
> down
>     the global warming :)
>
>     Tianqi
>
>     On Fri, Nov 30, 2018 at 9:38 AM kellen sunderland <
>     kellen.sunderland@gmail.com> wrote:
>
>     > Regarding cost, yes we could run this nightly or simply make it run
> an
>     > existing test suite that would make sense rather than having it
> duplicate a
>     > suite.
>     >
>     > On Fri, Nov 30, 2018 at 9:26 AM Kumar, Vikas
> <vikumar@amazon.com.invalid>
>     > wrote:
>     >
>     > > I don't think there is any downside to this proposal. I think a
> basic
>     > > sanity CI testing on AMD processors will give extra boost to our
> tests.
>     > > This adds to developer productivity and they have one less thing
> to worry
>     > > about. Developers have spent time in past where they had to
> manually test
>     > > on AMD  processors, MKLDNN being the recent instance. It's good to
> have
>     > > those test in CI pipeline.
>     > > All I see is benefit. If the $ cost is not too high for basic
> sanity
>     > > testing, we should do this, until and unless some strong downside
> is
>     > called
>     > > out.
>     > >
>     > > +1
>     > >
>     > >
>     > > On 11/29/18, 5:37 PM, "Anirudh Subramanian" <anirudh2290@gmail.com
> >
>     > > wrote:
>     > >
>     > >     Instruction set extensions support like AVX2, AVX512 etc. can
> vary
>     > > between
>     > >     AMD and Intel and there can also be a time lag between when
> Intel
>     > > supports
>     > >     it versus when AMD supports it.
>     > >     Also, in the future this setup may be useful in case MXNet
> supports
>     > AMD
>     > >     GPUs and AWS also happens to have support for it.
>     > >
>     > >     Anirudh
>     > >
>     > >
>     > >     On Thu, Nov 29, 2018 at 4:29 PM Marco de Abreu
>     > >     <marco.g.abreu@googlemail.com.invalid> wrote:
>     > >
>     > >     > I think it's worth a discussion to do a sanity check. While
>     > > generally these
>     > >     > instructions are standardized, we also made the experience
> with ARM
>     > > that
>     > >     > the theory and reality sometimes don't match. Thus, it's
> always
>     > good
>     > > to
>     > >     > check.
>     > >     >
>     > >     > In the next months we are going to refactor our slave
> creation
>     > > processes.
>     > >     > Chance Bair has been working on rewriting Windows slaves from
>     > > scratch (we
>     > >     > used images that haven't really been updated for 2 years -
> we still
>     > > don't
>     > >     > know what was done on them) and they're ready soon. In the
>     > following
>     > >     > months, we will also port our Ubuntu slaves to the new method
>     > (don't
>     > > have a
>     > >     > timeline yet). Ideally, the integration of AMD instances
> will only
>     > > be a
>     > >     > matter of running the same pipeline on a different instance
> type.
>     > In
>     > > that
>     > >     > Case, it should not be a big deal.
>     > >     >
>     > >     > If there are big differences, that's already a yellow flag
> for
>     > >     > compatibility, but that's unlikely. But in that case, we
> would have
>     > > to make
>     > >     > a more thorough time analysis and whether it's worth the
> effort.
>     > > Maybe,
>     > >     > somebody else could also lend us a hand and help us with
> adding AMD
>     > >     > support.
>     > >     >
>     > >     > -Marco
>     > >     >
>     > >     > Am Fr., 30. Nov. 2018, 01:22 hat Hao Jin <
> hjjn.amzn@gmail.com>
>     > >     > geschrieben:
>     > >     >
>     > >     > > f16c is also an instruction set supported by both brands'
> recent
>     > > CPUs
>     > >     > just
>     > >     > > like x86, AVX, SSE etc., and any difference in behaviors
> (quite
>     > >     > impossible
>     > >     > > to happen or it will be a major defect) would most likely
> be
>     > > caused by
>     > >     > the
>     > >     > > underlying hardware implementation, so still, adding AMD
>     > instances
>     > > is not
>     > >     > > adding much value here.
>     > >     > > Hao
>     > >     > >
>     > >     > > On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
>     > >     > > kellen.sunderland@gmail.com> wrote:
>     > >     > >
>     > >     > > > Just looked at the mf16c work and wanted to mention
Rahul
>     > > clearly _was_
>     > >     > > > thinking about AMD users in that PR.
>     > >     > > >
>     > >     > > > On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
>     > >     > > > kellen.sunderland@gmail.com> wrote:
>     > >     > > >
>     > >     > > > > From my perspective we're developing a few features
> like
>     > mf16c
>     > > and
>     > >     > > MKLDNN
>     > >     > > > > integration specifically for Intel CPUs.  It wouldn't
> hurt to
>     > > make
>     > >     > sure
>     > >     > > > > those changes also run properly on AMD cpus.
>     > >     > > > >
>     > >     > > > > On Thu, Nov 29, 2018, 3:38 PM Hao Jin <
> hjjn.amzn@gmail.com
>     > > wrote:
>     > >     > > > >
>     > >     > > > >> I'm a bit confused about why we need extra
> functionality
>     > > tests just
>     > >     > > for
>     > >     > > > >> AMD
>     > >     > > > >> CPUs, aren't AMD CPUs supporting roughly the
same
>     > instruction
>     > > sets
>     > >     > as
>     > >     > > > the
>     > >     > > > >> Intel ones? In the very impossible case that
something
>     > > working on
>     > >     > > Intel
>     > >     > > > >> CPUs being not functioning on AMD CPUs (or
vice
> versa), it
>     > > would
>     > >     > > mostly
>     > >     > > > >> likely be related to the underlying hardware
> implementation
>     > > of the
>     > >     > > same
>     > >     > > > >> ISA, to which we definitely do not have a good
> solution. So
>     > I
>     > > don't
>     > >     > > > think
>     > >     > > > >> performing extra tests on functional aspect
of the
> system on
>     > > AMD
>     > >     > CPUs
>     > >     > > is
>     > >     > > > >> adding any values.
>     > >     > > > >> Hao
>     > >     > > > >>
>     > >     > > > >> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu
>     > >     > <sethman@amazon.com.invalid
>     > >     > > >
>     > >     > > > >> wrote:
>     > >     > > > >>
>     > >     > > > >> > +1
>     > >     > > > >> >
>     > >     > > > >> > On 11/29/18, 2:39 PM, "Alex Zai" <azai91@gmail.com>
>     > wrote:
>     > >     > > > >> >
>     > >     > > > >> >     What are people's thoughts on having
AMD
> machines
>     > > tested on
>     > >     > the
>     > >     > > > CI?
>     > >     > > > >> AMD
>     > >     > > > >> >     machines are now available on AWS.
>     > >     > > > >> >
>     > >     > > > >> >     Best,
>     > >     > > > >> >     Alex
>     > >     > > > >> >
>     > >     > > > >> >
>     > >     > > > >> >
>     > >     > > > >>
>     > >     > > > >
>     > >     > > >
>     > >     > >
>     > >     >
>     > >
>     > >
>     > >
>     >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message