mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kellen sunderland <kellen.sunderl...@gmail.com>
Subject Re: Adding AMD CPU to CI
Date Fri, 30 Nov 2018 18:32:55 GMT
Damn, knew i should have double-checked!  Oh well it's also carbon neutral.

On Fri, Nov 30, 2018 at 10:27 AM Pedro Larroy <pedro.larroy.lists@gmail.com>
wrote:

> Agee with Tianqi and Hao. Adding AMD brings no value and increases
> complexity and CI cost. The instructions sets are the same. For
> benchmarking it might make sense though.
>
> Pedro
>
> > On 30. Nov 2018, at 18:19, Tianqi Chen <tqchen@cs.washington.edu> wrote:
> >
> > I still think it is overkill to add AMD CPU to the CI, given the
> additional
> > cost it could bring and little additional information we can get out from
> > it.
> >
> > A middle group is to add AMD CPU to a nightly build or final sweep before
> > release. If there is a case that we find that AMD CPU really makes a
> > difference, then we add it to the CI
> >
> > Tianqi
> >
> >> On Thu, Nov 29, 2018 at 6:29 PM Hao Jin <hjjn.amzn@gmail.com> wrote:
> >>
> >> For CPUs, the supported instruction sets may also vary between the same
> >> manufacturer's different product lines of the same generation
> (Skylake-SP
> >> versus Skylake).
> >> For the same instruction set, the two manufacturers should both have a
> >> working version of the hardware implementation. If any of the
> >> implementations does not work, then the chip would not even be
> considered
> >> functioning properly.
> >> If some AMD CPUs only support up to AVX2 instruction sets, they would
> just
> >> function in the same way as an Intel CPU that supports up to AVX2
> >> instruction sets. The performance may vary, but the capability and
> behavior
> >> of the two chips would be the same when given the same machine code.
> >> For AMD GPUs it's a totally different story, as AMD GPUs do not share
> the
> >> same instruction sets with the NVIDIA ones, thus testing on AMD GPUs(if
> we
> >> do have support for them) would definitely add values.
> >> Hao
> >>
> >> On Thu, Nov 29, 2018 at 8:37 PM Anirudh Subramanian <
> anirudh2290@gmail.com
> >>>
> >> wrote:
> >>
> >>> Instruction set extensions support like AVX2, AVX512 etc. can vary
> >> between
> >>> AMD and Intel and there can also be a time lag between when Intel
> >> supports
> >>> it versus when AMD supports it.
> >>> Also, in the future this setup may be useful in case MXNet supports AMD
> >>> GPUs and AWS also happens to have support for it.
> >>>
> >>> Anirudh
> >>>
> >>>
> >>> On Thu, Nov 29, 2018 at 4:29 PM Marco de Abreu
> >>> <marco.g.abreu@googlemail.com.invalid> wrote:
> >>>
> >>>> I think it's worth a discussion to do a sanity check. While generally
> >>> these
> >>>> instructions are standardized, we also made the experience with ARM
> >> that
> >>>> the theory and reality sometimes don't match. Thus, it's always good
> to
> >>>> check.
> >>>>
> >>>> In the next months we are going to refactor our slave creation
> >> processes.
> >>>> Chance Bair has been working on rewriting Windows slaves from scratch
> >> (we
> >>>> used images that haven't really been updated for 2 years - we still
> >> don't
> >>>> know what was done on them) and they're ready soon. In the following
> >>>> months, we will also port our Ubuntu slaves to the new method (don't
> >>> have a
> >>>> timeline yet). Ideally, the integration of AMD instances will only be
> a
> >>>> matter of running the same pipeline on a different instance type. In
> >> that
> >>>> Case, it should not be a big deal.
> >>>>
> >>>> If there are big differences, that's already a yellow flag for
> >>>> compatibility, but that's unlikely. But in that case, we would have
to
> >>> make
> >>>> a more thorough time analysis and whether it's worth the effort.
> Maybe,
> >>>> somebody else could also lend us a hand and help us with adding AMD
> >>>> support.
> >>>>
> >>>> -Marco
> >>>>
> >>>> Am Fr., 30. Nov. 2018, 01:22 hat Hao Jin <hjjn.amzn@gmail.com>
> >>>> geschrieben:
> >>>>
> >>>>> f16c is also an instruction set supported by both brands' recent
CPUs
> >>>> just
> >>>>> like x86, AVX, SSE etc., and any difference in behaviors (quite
> >>>> impossible
> >>>>> to happen or it will be a major defect) would most likely be caused
> >> by
> >>>> the
> >>>>> underlying hardware implementation, so still, adding AMD instances
is
> >>> not
> >>>>> adding much value here.
> >>>>> Hao
> >>>>>
> >>>>> On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
> >>>>> kellen.sunderland@gmail.com> wrote:
> >>>>>
> >>>>>> Just looked at the mf16c work and wanted to mention Rahul clearly
> >>> _was_
> >>>>>> thinking about AMD users in that PR.
> >>>>>>
> >>>>>> On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
> >>>>>> kellen.sunderland@gmail.com> wrote:
> >>>>>>
> >>>>>>> From my perspective we're developing a few features like
mf16c
> >> and
> >>>>> MKLDNN
> >>>>>>> integration specifically for Intel CPUs.  It wouldn't hurt
to
> >> make
> >>>> sure
> >>>>>>> those changes also run properly on AMD cpus.
> >>>>>>>
> >>>>>>> On Thu, Nov 29, 2018, 3:38 PM Hao Jin <hjjn.amzn@gmail.com
> >> wrote:
> >>>>>>>
> >>>>>>>> I'm a bit confused about why we need extra functionality
tests
> >>> just
> >>>>> for
> >>>>>>>> AMD
> >>>>>>>> CPUs, aren't AMD CPUs supporting roughly the same instruction
> >> sets
> >>>> as
> >>>>>> the
> >>>>>>>> Intel ones? In the very impossible case that something
working
> >> on
> >>>>> Intel
> >>>>>>>> CPUs being not functioning on AMD CPUs (or vice versa),
it would
> >>>>> mostly
> >>>>>>>> likely be related to the underlying hardware implementation
of
> >> the
> >>>>> same
> >>>>>>>> ISA, to which we definitely do not have a good solution.
So I
> >>> don't
> >>>>>> think
> >>>>>>>> performing extra tests on functional aspect of the system
on AMD
> >>>> CPUs
> >>>>> is
> >>>>>>>> adding any values.
> >>>>>>>> Hao
> >>>>>>>>
> >>>>>>>> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu
> >>>> <sethman@amazon.com.invalid
> >>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> +1
> >>>>>>>>>
> >>>>>>>>> ´╗┐On 11/29/18, 2:39 PM, "Alex Zai" <azai91@gmail.com>
wrote:
> >>>>>>>>>
> >>>>>>>>>    What are people's thoughts on having AMD machines
tested
> >> on
> >>>> the
> >>>>>> CI?
> >>>>>>>> AMD
> >>>>>>>>>    machines are now available on AWS.
> >>>>>>>>>
> >>>>>>>>>    Best,
> >>>>>>>>>    Alex
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message