mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Larroy <pedro.larroy.li...@gmail.com>
Subject Re: Adding AMD CPU to CI
Date Fri, 30 Nov 2018 18:27:23 GMT
Agee with Tianqi and Hao. Adding AMD brings no value and increases complexity and CI cost.
The instructions sets are the same. For benchmarking it might make sense though.

Pedro

> On 30. Nov 2018, at 18:19, Tianqi Chen <tqchen@cs.washington.edu> wrote:
> 
> I still think it is overkill to add AMD CPU to the CI, given the additional
> cost it could bring and little additional information we can get out from
> it.
> 
> A middle group is to add AMD CPU to a nightly build or final sweep before
> release. If there is a case that we find that AMD CPU really makes a
> difference, then we add it to the CI
> 
> Tianqi
> 
>> On Thu, Nov 29, 2018 at 6:29 PM Hao Jin <hjjn.amzn@gmail.com> wrote:
>> 
>> For CPUs, the supported instruction sets may also vary between the same
>> manufacturer's different product lines of the same generation (Skylake-SP
>> versus Skylake).
>> For the same instruction set, the two manufacturers should both have a
>> working version of the hardware implementation. If any of the
>> implementations does not work, then the chip would not even be considered
>> functioning properly.
>> If some AMD CPUs only support up to AVX2 instruction sets, they would just
>> function in the same way as an Intel CPU that supports up to AVX2
>> instruction sets. The performance may vary, but the capability and behavior
>> of the two chips would be the same when given the same machine code.
>> For AMD GPUs it's a totally different story, as AMD GPUs do not share the
>> same instruction sets with the NVIDIA ones, thus testing on AMD GPUs(if we
>> do have support for them) would definitely add values.
>> Hao
>> 
>> On Thu, Nov 29, 2018 at 8:37 PM Anirudh Subramanian <anirudh2290@gmail.com
>>> 
>> wrote:
>> 
>>> Instruction set extensions support like AVX2, AVX512 etc. can vary
>> between
>>> AMD and Intel and there can also be a time lag between when Intel
>> supports
>>> it versus when AMD supports it.
>>> Also, in the future this setup may be useful in case MXNet supports AMD
>>> GPUs and AWS also happens to have support for it.
>>> 
>>> Anirudh
>>> 
>>> 
>>> On Thu, Nov 29, 2018 at 4:29 PM Marco de Abreu
>>> <marco.g.abreu@googlemail.com.invalid> wrote:
>>> 
>>>> I think it's worth a discussion to do a sanity check. While generally
>>> these
>>>> instructions are standardized, we also made the experience with ARM
>> that
>>>> the theory and reality sometimes don't match. Thus, it's always good to
>>>> check.
>>>> 
>>>> In the next months we are going to refactor our slave creation
>> processes.
>>>> Chance Bair has been working on rewriting Windows slaves from scratch
>> (we
>>>> used images that haven't really been updated for 2 years - we still
>> don't
>>>> know what was done on them) and they're ready soon. In the following
>>>> months, we will also port our Ubuntu slaves to the new method (don't
>>> have a
>>>> timeline yet). Ideally, the integration of AMD instances will only be a
>>>> matter of running the same pipeline on a different instance type. In
>> that
>>>> Case, it should not be a big deal.
>>>> 
>>>> If there are big differences, that's already a yellow flag for
>>>> compatibility, but that's unlikely. But in that case, we would have to
>>> make
>>>> a more thorough time analysis and whether it's worth the effort. Maybe,
>>>> somebody else could also lend us a hand and help us with adding AMD
>>>> support.
>>>> 
>>>> -Marco
>>>> 
>>>> Am Fr., 30. Nov. 2018, 01:22 hat Hao Jin <hjjn.amzn@gmail.com>
>>>> geschrieben:
>>>> 
>>>>> f16c is also an instruction set supported by both brands' recent CPUs
>>>> just
>>>>> like x86, AVX, SSE etc., and any difference in behaviors (quite
>>>> impossible
>>>>> to happen or it will be a major defect) would most likely be caused
>> by
>>>> the
>>>>> underlying hardware implementation, so still, adding AMD instances is
>>> not
>>>>> adding much value here.
>>>>> Hao
>>>>> 
>>>>> On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
>>>>> kellen.sunderland@gmail.com> wrote:
>>>>> 
>>>>>> Just looked at the mf16c work and wanted to mention Rahul clearly
>>> _was_
>>>>>> thinking about AMD users in that PR.
>>>>>> 
>>>>>> On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
>>>>>> kellen.sunderland@gmail.com> wrote:
>>>>>> 
>>>>>>> From my perspective we're developing a few features like mf16c
>> and
>>>>> MKLDNN
>>>>>>> integration specifically for Intel CPUs.  It wouldn't hurt to
>> make
>>>> sure
>>>>>>> those changes also run properly on AMD cpus.
>>>>>>> 
>>>>>>> On Thu, Nov 29, 2018, 3:38 PM Hao Jin <hjjn.amzn@gmail.com
>> wrote:
>>>>>>> 
>>>>>>>> I'm a bit confused about why we need extra functionality
tests
>>> just
>>>>> for
>>>>>>>> AMD
>>>>>>>> CPUs, aren't AMD CPUs supporting roughly the same instruction
>> sets
>>>> as
>>>>>> the
>>>>>>>> Intel ones? In the very impossible case that something working
>> on
>>>>> Intel
>>>>>>>> CPUs being not functioning on AMD CPUs (or vice versa), it
would
>>>>> mostly
>>>>>>>> likely be related to the underlying hardware implementation
of
>> the
>>>>> same
>>>>>>>> ISA, to which we definitely do not have a good solution.
So I
>>> don't
>>>>>> think
>>>>>>>> performing extra tests on functional aspect of the system
on AMD
>>>> CPUs
>>>>> is
>>>>>>>> adding any values.
>>>>>>>> Hao
>>>>>>>> 
>>>>>>>> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu
>>>> <sethman@amazon.com.invalid
>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> +1
>>>>>>>>> 
>>>>>>>>> ´╗┐On 11/29/18, 2:39 PM, "Alex Zai" <azai91@gmail.com>
wrote:
>>>>>>>>> 
>>>>>>>>>    What are people's thoughts on having AMD machines
tested
>> on
>>>> the
>>>>>> CI?
>>>>>>>> AMD
>>>>>>>>>    machines are now available on AWS.
>>>>>>>>> 
>>>>>>>>>    Best,
>>>>>>>>>    Alex
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Mime
View raw message