mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Larroy <pedro.larroy.li...@gmail.com>
Subject Re: Adding AMD CPU to CI
Date Fri, 30 Nov 2018 18:56:16 GMT
I think just Adding AMD is not the right abstraction level. Testing and benchmarking with different
cpu flags / march ie AVX2 sse2 brings value in my opinion. Just testing another vendor of
a compatible cpu doesn’t.

Pedro

> On 30. Nov 2018, at 19:32, kellen sunderland <kellen.sunderland@gmail.com> wrote:
> 
> Damn, knew i should have double-checked!  Oh well it's also carbon neutral.
> 
> On Fri, Nov 30, 2018 at 10:27 AM Pedro Larroy <pedro.larroy.lists@gmail.com>
> wrote:
> 
>> Agee with Tianqi and Hao. Adding AMD brings no value and increases
>> complexity and CI cost. The instructions sets are the same. For
>> benchmarking it might make sense though.
>> 
>> Pedro
>> 
>>> On 30. Nov 2018, at 18:19, Tianqi Chen <tqchen@cs.washington.edu> wrote:
>>> 
>>> I still think it is overkill to add AMD CPU to the CI, given the
>> additional
>>> cost it could bring and little additional information we can get out from
>>> it.
>>> 
>>> A middle group is to add AMD CPU to a nightly build or final sweep before
>>> release. If there is a case that we find that AMD CPU really makes a
>>> difference, then we add it to the CI
>>> 
>>> Tianqi
>>> 
>>>> On Thu, Nov 29, 2018 at 6:29 PM Hao Jin <hjjn.amzn@gmail.com> wrote:
>>>> 
>>>> For CPUs, the supported instruction sets may also vary between the same
>>>> manufacturer's different product lines of the same generation
>> (Skylake-SP
>>>> versus Skylake).
>>>> For the same instruction set, the two manufacturers should both have a
>>>> working version of the hardware implementation. If any of the
>>>> implementations does not work, then the chip would not even be
>> considered
>>>> functioning properly.
>>>> If some AMD CPUs only support up to AVX2 instruction sets, they would
>> just
>>>> function in the same way as an Intel CPU that supports up to AVX2
>>>> instruction sets. The performance may vary, but the capability and
>> behavior
>>>> of the two chips would be the same when given the same machine code.
>>>> For AMD GPUs it's a totally different story, as AMD GPUs do not share
>> the
>>>> same instruction sets with the NVIDIA ones, thus testing on AMD GPUs(if
>> we
>>>> do have support for them) would definitely add values.
>>>> Hao
>>>> 
>>>> On Thu, Nov 29, 2018 at 8:37 PM Anirudh Subramanian <
>> anirudh2290@gmail.com
>>>>> 
>>>> wrote:
>>>> 
>>>>> Instruction set extensions support like AVX2, AVX512 etc. can vary
>>>> between
>>>>> AMD and Intel and there can also be a time lag between when Intel
>>>> supports
>>>>> it versus when AMD supports it.
>>>>> Also, in the future this setup may be useful in case MXNet supports AMD
>>>>> GPUs and AWS also happens to have support for it.
>>>>> 
>>>>> Anirudh
>>>>> 
>>>>> 
>>>>> On Thu, Nov 29, 2018 at 4:29 PM Marco de Abreu
>>>>> <marco.g.abreu@googlemail.com.invalid> wrote:
>>>>> 
>>>>>> I think it's worth a discussion to do a sanity check. While generally
>>>>> these
>>>>>> instructions are standardized, we also made the experience with ARM
>>>> that
>>>>>> the theory and reality sometimes don't match. Thus, it's always good
>> to
>>>>>> check.
>>>>>> 
>>>>>> In the next months we are going to refactor our slave creation
>>>> processes.
>>>>>> Chance Bair has been working on rewriting Windows slaves from scratch
>>>> (we
>>>>>> used images that haven't really been updated for 2 years - we still
>>>> don't
>>>>>> know what was done on them) and they're ready soon. In the following
>>>>>> months, we will also port our Ubuntu slaves to the new method (don't
>>>>> have a
>>>>>> timeline yet). Ideally, the integration of AMD instances will only
be
>> a
>>>>>> matter of running the same pipeline on a different instance type.
In
>>>> that
>>>>>> Case, it should not be a big deal.
>>>>>> 
>>>>>> If there are big differences, that's already a yellow flag for
>>>>>> compatibility, but that's unlikely. But in that case, we would have
to
>>>>> make
>>>>>> a more thorough time analysis and whether it's worth the effort.
>> Maybe,
>>>>>> somebody else could also lend us a hand and help us with adding AMD
>>>>>> support.
>>>>>> 
>>>>>> -Marco
>>>>>> 
>>>>>> Am Fr., 30. Nov. 2018, 01:22 hat Hao Jin <hjjn.amzn@gmail.com>
>>>>>> geschrieben:
>>>>>> 
>>>>>>> f16c is also an instruction set supported by both brands' recent
CPUs
>>>>>> just
>>>>>>> like x86, AVX, SSE etc., and any difference in behaviors (quite
>>>>>> impossible
>>>>>>> to happen or it will be a major defect) would most likely be
caused
>>>> by
>>>>>> the
>>>>>>> underlying hardware implementation, so still, adding AMD instances
is
>>>>> not
>>>>>>> adding much value here.
>>>>>>> Hao
>>>>>>> 
>>>>>>> On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
>>>>>>> kellen.sunderland@gmail.com> wrote:
>>>>>>> 
>>>>>>>> Just looked at the mf16c work and wanted to mention Rahul
clearly
>>>>> _was_
>>>>>>>> thinking about AMD users in that PR.
>>>>>>>> 
>>>>>>>> On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
>>>>>>>> kellen.sunderland@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> From my perspective we're developing a few features like
mf16c
>>>> and
>>>>>>> MKLDNN
>>>>>>>>> integration specifically for Intel CPUs.  It wouldn't
hurt to
>>>> make
>>>>>> sure
>>>>>>>>> those changes also run properly on AMD cpus.
>>>>>>>>> 
>>>>>>>>> On Thu, Nov 29, 2018, 3:38 PM Hao Jin <hjjn.amzn@gmail.com
>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> I'm a bit confused about why we need extra functionality
tests
>>>>> just
>>>>>>> for
>>>>>>>>>> AMD
>>>>>>>>>> CPUs, aren't AMD CPUs supporting roughly the same
instruction
>>>> sets
>>>>>> as
>>>>>>>> the
>>>>>>>>>> Intel ones? In the very impossible case that something
working
>>>> on
>>>>>>> Intel
>>>>>>>>>> CPUs being not functioning on AMD CPUs (or vice versa),
it would
>>>>>>> mostly
>>>>>>>>>> likely be related to the underlying hardware implementation
of
>>>> the
>>>>>>> same
>>>>>>>>>> ISA, to which we definitely do not have a good solution.
So I
>>>>> don't
>>>>>>>> think
>>>>>>>>>> performing extra tests on functional aspect of the
system on AMD
>>>>>> CPUs
>>>>>>> is
>>>>>>>>>> adding any values.
>>>>>>>>>> Hao
>>>>>>>>>> 
>>>>>>>>>> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu
>>>>>> <sethman@amazon.com.invalid
>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> +1
>>>>>>>>>>> 
>>>>>>>>>>> On 11/29/18, 2:39 PM, "Alex Zai" <azai91@gmail.com>
wrote:
>>>>>>>>>>> 
>>>>>>>>>>>   What are people's thoughts on having AMD machines
tested
>>>> on
>>>>>> the
>>>>>>>> CI?
>>>>>>>>>> AMD
>>>>>>>>>>>   machines are now available on AWS.
>>>>>>>>>>> 
>>>>>>>>>>>   Best,
>>>>>>>>>>>   Alex
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 

Mime
View raw message