mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Larroy <pedro.larroy.li...@gmail.com>
Subject Re: CI Update
Date Tue, 03 Dec 2019 19:11:24 GMT
Some PRs were experiencing build timeouts in the past. I have diagnosed
this to be a saturation of the EFS volume holding the compilation cache.
Once CI is back online this problem is very likely to be solved and you
should not see any more build timeout issues.

On Tue, Dec 3, 2019 at 10:18 AM Pedro Larroy <pedro.larroy.lists@gmail.com>
wrote:

> Also please take note that there's a stage building TVM which is executing
> compilation serially and takes a lot of time which impacts CI turnaround
> time:
>
> https://github.com/apache/incubator-mxnet/issues/16962
>
> Pedro
>
> On Tue, Dec 3, 2019 at 9:49 AM Pedro Larroy <pedro.larroy.lists@gmail.com>
> wrote:
>
>> Hi MXNet community. We are in the process of updating the base AMIs for
>> CI with an updated CUDA driver to fix the CI blockage.
>>
>> We would need help from the community to diagnose some of the build
>> errors which don't seem related to the infrastructure.
>>
>> I have observed this build failure with tvm when not installing the cuda
>> driver in the container:
>>
>>
>> https://pastebin.com/bQA0W2U4
>>
>> centos gpu builds and tests seem to run with the updated AMI and changes
>> to the container.
>>
>>
>> Thanks.
>>
>>
>> On Mon, Dec 2, 2019 at 12:11 PM Pedro Larroy <
>> pedro.larroy.lists@gmail.com> wrote:
>>
>>> Small update about CI, which is blocked.
>>>
>>> Seems there's a nvidia driver compatibility problem in the base AMI that
>>> is running in GPU instances and the nvidia docker images that we use for
>>> building and testing.
>>>
>>> We are working on providing a fix by updating the base images as doesn't
>>> seem to be easy to fix by just changing the container.
>>>
>>> Thanks.
>>>
>>> Pedro.
>>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message