airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Potiuk <Jarek.Pot...@polidea.com>
Subject Re: Travis CI random failures
Date Tue, 23 Jul 2019 17:42:54 GMT
FYI. Still not fixed. Others experience this as well:
https://github.com/travis-ci/worker/issues/604

On Tue, Jul 23, 2019 at 11:34 AM Jarek Potiuk <Jarek.Potiuk@polidea.com>
wrote:

> No good news yet. We are getting randomly assigned 1CPU /3.5GB mem
> instances still. Infrastructure is on it.
>
> On Tue, Jul 23, 2019 at 10:49 AM Jarek Potiuk <Jarek.Potiuk@polidea.com>
> wrote:
>
>> It looks like we are back to the original specs. I am runnning tests and
>> re-enable everything if I see it works.
>>
>> J.
>>
>> On Tue, Jul 23, 2019 at 10:34 AM Jarek Potiuk <Jarek.Potiuk@polidea.com>
>> wrote:
>>
>>> From INFRA: "I have confirmed that our builds appear to be running with
>>> 3.75GB memory and 1 core currently. This does not match Travis' standard
>>> specs (7.5GB and 2 cores), and I have raised a ticket with their support. I
>>> will respond when we hear back from Travis."
>>>
>>>
>>> On Tue, Jul 23, 2019 at 10:26 AM Jarek Potiuk <Jarek.Potiuk@polidea.com>
>>> wrote:
>>>
>>>> It's definitely confirmed that the problem is on Travis CI side:
>>>>
>>>> I re-run the commit before the new CI was introduced (I cherry-picked a
>>>> small doc fix related to recent sphinx dependency update) and it fails in
>>>> exactly the same way (memory and cpu problems):
>>>> https://travis-ci.org/apache/airflow/builds/562450592.
>>>>
>>>> For now I cannot do much but wait for the INFRA's response (and work on
>>>> GitLab CI replacement of Travis).
>>>>
>>>> I recommend to bring some pop-corn. It's going to be an interesting one
>>>> to watch.
>>>>
>>>> J.
>>>>
>>>> On Tue, Jul 23, 2019 at 9:43 AM Jarek Potiuk <Jarek.Potiuk@polidea.com>
>>>> wrote:
>>>>
>>>>> It's now pretty consistent and happens pretty much every time using
>>>>> the old build system - for example here:
>>>>> https://travis-ci.org/apache/airflow/builds/562435992.
>>>>>
>>>>> I will cancel all PRs and disable automated PR build on Travis until
>>>>> we solve the problem - as it is pointless - new PRs will simply queue
and
>>>>> fail constantly.
>>>>>
>>>>> I opened critical infrastructure ticket:
>>>>> https://issues.apache.org/jira/browse/INFRA-18787 and I am running
>>>>> some additional tests - I run the builds from commit before the new CI
so
>>>>> that I see if another change since then could cause it.
>>>>>
>>>>> J.
>>>>>
>>>>>
>>>>> On Tue, Jul 23, 2019 at 8:55 AM Jarek Potiuk <Jarek.Potiuk@polidea.com>
>>>>> wrote:
>>>>>
>>>>>> Update2: I can confirm that the same memory/resource related issues
>>>>>> happen in my Travis CI forks with reverted changes :(
>>>>>> https://travis-ci.org/potiuk/airflow/builds/562430507 . I will
>>>>>> escalate it to Travis/APACHE infrastructure
>>>>>>
>>>>>> On Tue, Jul 23, 2019 at 8:35 AM Jarek Potiuk <
>>>>>> Jarek.Potiuk@polidea.com> wrote:
>>>>>>
>>>>>>> Update: it looks like it's Travis's problem: I reverted the CI
>>>>>>> changes and we have the same CPU problem in the old build:
>>>>>>> https://travis-ci.org/potiuk/airflow/jobs/562430517 .
>>>>>>>
>>>>>>> On Tue, Jul 23, 2019 at 8:32 AM Jarek Potiuk <
>>>>>>> Jarek.Potiuk@polidea.com> wrote:
>>>>>>>
>>>>>>>> Hello everyone,
>>>>>>>>
>>>>>>>> We've started to experience some random failures on Travis
relaated
>>>>>>>> to lack of resources: those are either Out of Memory errors
or lack of CPUS
>>>>>>>> to run Kubernetes builds.
>>>>>>>>
>>>>>>>> I tried to rerun those, thinking it was an intermittent error.
It
>>>>>>>> started happening yesterday and I have not seen it before
so I rather doubt
>>>>>>>> it is related to the latest changes.
>>>>>>>>
>>>>>>>> But I do not want to risk everyone being blocked so I am
testing
>>>>>>>> now on my own fork if reverting the latest CI changes help.
I will let you
>>>>>>>> know and will revert in case I found old CI works in a stable
way.
>>>>>>>>
>>>>>>>> In the meantime - I will cancel all outstanding builds  that
are
>>>>>>>> blocking our queue and will test it both old CI and new CI
in our fork :(
>>>>>>>> (Travis queue limit is not helping).
>>>>>>>>
>>>>>>>> Can you please hold on with rebasing/pushing new PRs until
I check
>>>>>>>> it.
>>>>>>>>
>>>>>>>> Example failures:
>>>>>>>>
>>>>>>>>
>>>>>>>>    - OSError: [Errno 12] Cannot allocate memory (
>>>>>>>>    https://travis-ci.org/apache/airflow/jobs/562395978)
>>>>>>>>    - [ERROR NumCPU]: the number of available CPUs 1 is less
than
>>>>>>>>    the required 2 (
>>>>>>>>    https://travis-ci.org/apache/airflow/jobs/562395978)
>>>>>>>>
>>>>>>>>
>>>>>>>> J.
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Jarek Potiuk
>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software
Engineer
>>>>>>>>
>>>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Jarek Potiuk
>>>>>>> Polidea <https://www.polidea.com/> | Principal Software
Engineer
>>>>>>>
>>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Jarek Potiuk
>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>>
>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Jarek Potiuk
>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>
>>>>> M: +48 660 796 129 <+48660796129>
>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>
>>>> M: +48 660 796 129 <+48660796129>
>>>> [image: Polidea] <https://www.polidea.com/>
>>>>
>>>>
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>>
>>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message