airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Potiuk <Jarek.Pot...@polidea.com>
Subject Re: Travis CI random failures
Date Tue, 23 Jul 2019 09:34:30 GMT
No good news yet. We are getting randomly assigned 1CPU /3.5GB mem
instances still. Infrastructure is on it.

On Tue, Jul 23, 2019 at 10:49 AM Jarek Potiuk <Jarek.Potiuk@polidea.com>
wrote:

> It looks like we are back to the original specs. I am runnning tests and
> re-enable everything if I see it works.
>
> J.
>
> On Tue, Jul 23, 2019 at 10:34 AM Jarek Potiuk <Jarek.Potiuk@polidea.com>
> wrote:
>
>> From INFRA: "I have confirmed that our builds appear to be running with
>> 3.75GB memory and 1 core currently. This does not match Travis' standard
>> specs (7.5GB and 2 cores), and I have raised a ticket with their support. I
>> will respond when we hear back from Travis."
>>
>>
>> On Tue, Jul 23, 2019 at 10:26 AM Jarek Potiuk <Jarek.Potiuk@polidea.com>
>> wrote:
>>
>>> It's definitely confirmed that the problem is on Travis CI side:
>>>
>>> I re-run the commit before the new CI was introduced (I cherry-picked a
>>> small doc fix related to recent sphinx dependency update) and it fails in
>>> exactly the same way (memory and cpu problems):
>>> https://travis-ci.org/apache/airflow/builds/562450592.
>>>
>>> For now I cannot do much but wait for the INFRA's response (and work on
>>> GitLab CI replacement of Travis).
>>>
>>> I recommend to bring some pop-corn. It's going to be an interesting one
>>> to watch.
>>>
>>> J.
>>>
>>> On Tue, Jul 23, 2019 at 9:43 AM Jarek Potiuk <Jarek.Potiuk@polidea.com>
>>> wrote:
>>>
>>>> It's now pretty consistent and happens pretty much every time using the
>>>> old build system - for example here:
>>>> https://travis-ci.org/apache/airflow/builds/562435992.
>>>>
>>>> I will cancel all PRs and disable automated PR build on Travis until we
>>>> solve the problem - as it is pointless - new PRs will simply queue and fail
>>>> constantly.
>>>>
>>>> I opened critical infrastructure ticket:
>>>> https://issues.apache.org/jira/browse/INFRA-18787 and I am running
>>>> some additional tests - I run the builds from commit before the new CI so
>>>> that I see if another change since then could cause it.
>>>>
>>>> J.
>>>>
>>>>
>>>> On Tue, Jul 23, 2019 at 8:55 AM Jarek Potiuk <Jarek.Potiuk@polidea.com>
>>>> wrote:
>>>>
>>>>> Update2: I can confirm that the same memory/resource related issues
>>>>> happen in my Travis CI forks with reverted changes :(
>>>>> https://travis-ci.org/potiuk/airflow/builds/562430507 . I will
>>>>> escalate it to Travis/APACHE infrastructure
>>>>>
>>>>> On Tue, Jul 23, 2019 at 8:35 AM Jarek Potiuk <Jarek.Potiuk@polidea.com>
>>>>> wrote:
>>>>>
>>>>>> Update: it looks like it's Travis's problem: I reverted the CI
>>>>>> changes and we have the same CPU problem in the old build:
>>>>>> https://travis-ci.org/potiuk/airflow/jobs/562430517 .
>>>>>>
>>>>>> On Tue, Jul 23, 2019 at 8:32 AM Jarek Potiuk <
>>>>>> Jarek.Potiuk@polidea.com> wrote:
>>>>>>
>>>>>>> Hello everyone,
>>>>>>>
>>>>>>> We've started to experience some random failures on Travis relaated
>>>>>>> to lack of resources: those are either Out of Memory errors or
lack of CPUS
>>>>>>> to run Kubernetes builds.
>>>>>>>
>>>>>>> I tried to rerun those, thinking it was an intermittent error.
It
>>>>>>> started happening yesterday and I have not seen it before so
I rather doubt
>>>>>>> it is related to the latest changes.
>>>>>>>
>>>>>>> But I do not want to risk everyone being blocked so I am testing
now
>>>>>>> on my own fork if reverting the latest CI changes help. I will
let you know
>>>>>>> and will revert in case I found old CI works in a stable way.
>>>>>>>
>>>>>>> In the meantime - I will cancel all outstanding builds  that
are
>>>>>>> blocking our queue and will test it both old CI and new CI in
our fork :(
>>>>>>> (Travis queue limit is not helping).
>>>>>>>
>>>>>>> Can you please hold on with rebasing/pushing new PRs until I
check
>>>>>>> it.
>>>>>>>
>>>>>>> Example failures:
>>>>>>>
>>>>>>>
>>>>>>>    - OSError: [Errno 12] Cannot allocate memory (
>>>>>>>    https://travis-ci.org/apache/airflow/jobs/562395978)
>>>>>>>    - [ERROR NumCPU]: the number of available CPUs 1 is less than
>>>>>>>    the required 2 (
>>>>>>>    https://travis-ci.org/apache/airflow/jobs/562395978)
>>>>>>>
>>>>>>>
>>>>>>> J.
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Jarek Potiuk
>>>>>>> Polidea <https://www.polidea.com/> | Principal Software
Engineer
>>>>>>>
>>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Jarek Potiuk
>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>>
>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Jarek Potiuk
>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>
>>>>> M: +48 660 796 129 <+48660796129>
>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>
>>>> M: +48 660 796 129 <+48660796129>
>>>> [image: Polidea] <https://www.polidea.com/>
>>>>
>>>>
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>>
>>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message