airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Potiuk <Jarek.Pot...@polidea.com>
Subject Re: Travis builds in a queue for hours
Date Wed, 10 Jul 2019 20:55:57 GMT
Hello Everyone,

I have some really good news. I just had a call with Google OSS team (Gris,
Aizhamal) and they are willing to donate VMs on Google Cloud Platform to
run CI for Airflow. In order to simplify the setup (and make sure it is ok
according to Apache regulations) we think we should go exactly the same
route as Apache Beam project (Google donated 16x 16CPU machines for them).
The route of Apache Beam is to use the machines as workers for Apache
Jenkins (https://builds.apache.org/). Apache Jenkins is one of the
encouraged CI solutions by Apache and if we can have workers connected to
the existing Jenkins master of Apache, it means that the maintenance
overhead will be pretty minimal. And we can follow Apache Beam setup so I
do not expect any legal problems.

I also work very closely with the team that uses Apache Beam Jenkins
heavily so I have access to all the necessary experts to help with the
setup (and I am happy to help with that).

I really hope everyone in the community will be really happy to go in that
direction - it's. Please let me know if you have any concerns !

We do not need as many machines as Beam for sure (Beam uses the machines to
process a lot of data for tests including some load testing) but we need to
estimate the number/types of machines that we are going to need.
Fokko, Ash, others - do you have some recent numbers for the current usage
or should I open an Infrastructure ticket for it?

J

On Fri, Jun 28, 2019 at 4:50 PM Jarek Potiuk <Jarek.Potiuk@polidea.com>
wrote:

> Thanks Aizhamal! I spoke already to Gris and she confirmed that as well
> and the 8th of July date is ok for us as we will have to evaluate and
> prepare as well. Have a nice trip.
>
> J.
>
> On Fri, Jun 28, 2019 at 4:25 PM Aizhamal Nurmamat kyzy
> <aizhamal@google.com.invalid> wrote:
>
>> Hi all,
>>
>> On Thu, Jun 27, 2019 at 15:28 Jarek Potiuk <Jarek.Potiuk@polidea.com>
>> wrote:
>>
>> > Yeah. I also have a working version of Cloud build configuration and we
>> can
>> > run the tests on cloud build if we can get some credits from Google.
>>
>>
>> I can look into getting a small amount of credits approved for this, to
>> see
>> if it’s useful to offload some tests to Cloud Build, or to provision some
>> VMs to run on Apache Infra.
>>
>> I am traveling at the moment, but I’ll be back in the office on July 8,
>> and
>> I’ll try to get this done.
>>
>>
>> Thanks,
>> Aizhamal
>>
>> And
>> > the changes from the upcoming CI image will make it much easier to run
>> > tests on any CI provider. Except Kubernetes tests they are pretty much
>> > CI-agnostic. Kubernetes tests will likely be also fixed soon.
>> >
>> > Another idea: I thought that in the future we can also run only subset
>> of
>> > postgres/mysql/sqlite tests on all combinations. I think there are just
>> > handful of tests that are specific for backend (and we already know
>> which
>> > ones they are - they are skipped-if).
>> >
>> > J.
>> >
>> > Principal Software Engineer
>> > Phone: +48660796129
>> >
>> > czw., 27 cze 2019, 15:12 użytkownik Philippe Gagnon <
>> philgagnon1@gmail.com
>> > >
>> > napisał:
>> >
>> > > I think the combinations that you are proposing are sensible for
>> > pre-merge
>> > > checks.
>> > >
>> > > I am working on a proposal to offload extra combinations to another CI
>> > > provider (Azure DevOps specifically seems like a good candidate),
>> either
>> > > pre or post merge. Ideally I'd like to run more combinations pre-merge
>> > but
>> > > there is a trade-off to be conscious of here between development
>> velocity
>> > > and quality assurance, which I think this issue highlights quite well.
>> > >
>> > > Please let me know your thoughts
>> > >
>> > > Philippe
>> > >
>> > > On Thu, Jun 27, 2019 at 9:05 AM Jarek Potiuk <
>> Jarek.Potiuk@polidea.com>
>> > > wrote:
>> > >
>> > > > Agree that we should be thoughtful about others as well: In the
>> latest
>> > > push
>> > > > (few minutes ago) of the upcoming official CI image i implemented
>> the
>> > > > change we discussed in the Github where we limit the number of
>> > > combinations
>> > > > we test:
>> > > >
>> > > > You can see it yourself:
>> > > > https://travis-ci.org/apache/airflow/builds/551305240
>> > > >
>> > > > Those are the combinations I propose:
>> > > >
>> > > >  Python: 3.6
>> > > >  BACKEND=mysql ENV=docker
>> > > >
>> > > >  Python: 3.6
>> > > >  BACKEND=postgres ENV=docker
>> > > >
>> > > >  Python: 3.5
>> > > >  BACKEND=sqlite ENV=docker
>> > > >
>> > > >  Python: 3.6
>> > > >  BACKEND=postgres ENV=kubernetes KUBERNETES_VERSION=v1.13.0
>> > > >
>> > > > J,
>> > > >
>> > > >
>> > > > On Thu, Jun 27, 2019 at 11:00 AM Driesprong, Fokko
>> > <fokko@driesprong.frl
>> > > >
>> > > > wrote:
>> > > >
>> > > > > We got this message last year:
>> > > > >
>> > > > > > Hello, Airflow PPMC.
>> > > > > > While going through the usage statistics for our Travis
CI
>> > service, I
>> > > > > > have noticed that the Airflow project is using an abnormally
>> large
>> > > > > > amount of resources, 2600 hours per month or the equivalent
of
>> > having
>> > > > > > almost 4 machines building airflow non-stop 24/7. As this
is not
>> > > free,
>> > > > > > but rather costing us money, I'm contacting you with the
>> intention
>> > of
>> > > > > > figuring out ways to reduce the use of Travis for the project.
>> > > > >
>> > > > > > We would greatly prefer that the project itself comes up
with a
>> > > > solution
>> > > > > > to lower the usage of Travis, as we'd hate to simply turn
it off
>> > for
>> > > > > > you, but the usage is at a rather severe level, totaling
more
>> than
>> > > 21%
>> > > > > > of the total build time of all projects using Travis, so
>> something
>> > > > > > actionable should be decided upon and (preferably) completed
by
>> the
>> > > end
>> > > > > > of May that will reduce the consumption of Travis resources.
>> > > > >
>> > > > > > Alternately, if you are unable to lower the pressure on
Travis,
>> the
>> > > > > > podling and/or IPMC may ask the board of directors for a
>> separate
>> > > > budget
>> > > > > > for additional build nodes to cope with the added load -
I'll
>> leave
>> > > > this
>> > > > > > for the podling and IPMC to decide on.
>> > > > >
>> > > > > > Please let us know when you have decided on a plan to remedy
>> this
>> > > > > situation.
>> > > > >
>> > > > > > With regards,
>> > > > > > Daniel on behalf of ASF Infrastructure.
>> > > > >
>> > > > > I think more and more projects are still migrating to the ASF
>> Travis,
>> > > so
>> > > > I
>> > > > > think natural that there is more load. However, this still leaves
>> the
>> > > > > question if we have to run the full matrix.
>> > > > >
>> > > > > Cheers, Fokko
>> > > > >
>> > > > >
>> > > > >
>> > > > > Op do 27 jun. 2019 om 10:56 schreef Jarek Potiuk <
>> > > > Jarek.Potiuk@polidea.com
>> > > > > >:
>> > > > >
>> > > > > > I think we should really involve infra to increase the slot
>> number
>> > or
>> > > > > maybe
>> > > > > > even somehow allocate slots per project.
>> > > > > > The problem is that we cannot control what other apache
projects
>> > are
>> > > > > doing,
>> > > > > > so even if we decrease our runtime, it's the other projects
that
>> > > might
>> > > > > hold
>> > > > > > us in the queue :(
>> > > > > >
>> > > > > > J.
>> > > > > >
>> > > > > > On Thu, Jun 27, 2019 at 10:19 AM Driesprong, Fokko
>> > > > <fokko@driesprong.frl
>> > > > > >
>> > > > > > wrote:
>> > > > > >
>> > > > > > > I've noticed this at other Apache projects as well,
sometimes
>> it
>> > > > takes
>> > > > > up
>> > > > > > > to 7-8 hours. The only thing we can do, is reduce the
runtime
>> of
>> > > the
>> > > > > jobs
>> > > > > > > so we take less slots :-)
>> > > > > > >
>> > > > > > > Cheers, Fokko
>> > > > > > >
>> > > > > > > Op wo 26 jun. 2019 om 21:59 schreef Jarek Potiuk <
>> > > > > > Jarek.Potiuk@polidea.com
>> > > > > > > >:
>> > > > > > >
>> > > > > > > > Yep. That's what I suggested as the reason in
the ticket - I
>> > > guess
>> > > > > > INFRA
>> > > > > > > > are the only people who can do anything about
it (increase
>> > > > > concurrency
>> > > > > > ?
>> > > > > > > > pay more for Travis :)? ).
>> > > > > > > >
>> > > > > > > > On Wed, Jun 26, 2019 at 9:51 PM Ash Berlin-Taylor
<
>> > > ash@apache.org>
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > I asked Travis on twitter and they said it
was due to the
>> > > Apache
>> > > > > > other
>> > > > > > > > > projects build queues
>> > > > > > > > >
>> > > > > > > > > https://twitter.com/travisci/status/1143893051460526080
>> > > > > > > > >
>> > > > > > > > > -ash
>> > > > > > > > >
>> > > > > > > > > On 26 June 2019 20:48:33 BST, Jarek Potiuk
<
>> > > > > Jarek.Potiuk@polidea.com
>> > > > > > >
>> > > > > > > > > wrote:
>> > > > > > > > >>
>> > > > > > > > >> Hello everyone,
>> > > > > > > > >>
>> > > > > > > > >> For the last few days the Travis builds
for
>> apache/airflow
>> > > > project
>> > > > > > are
>> > > > > > > > >> waiting in a queue for hours. This is
not a normal
>> > situation.
>> > > > I've
>> > > > > > > > opened
>> > > > > > > > >> INFRA ticket for that:
>> > > > > > > > https://issues.apache.org/jira/browse/INFRA-18657
>> > > > > > > > >>
>> > > > > > > > >> J.
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > >
>> > > > > > > > Jarek Potiuk
>> > > > > > > > Polidea <https://www.polidea.com/> | Principal
Software
>> > Engineer
>> > > > > > > >
>> > > > > > > > M: +48 660 796 129 <+48660796129>
>> > > > > > > > [image: Polidea] <https://www.polidea.com/>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > >
>> > > > > > Jarek Potiuk
>> > > > > > Polidea <https://www.polidea.com/> | Principal Software
>> Engineer
>> > > > > >
>> > > > > > M: +48 660 796 129 <+48660796129>
>> > > > > > [image: Polidea] <https://www.polidea.com/>
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > >
>> > > > Jarek Potiuk
>> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> > > >
>> > > > M: +48 660 796 129 <+48660796129>
>> > > > [image: Polidea] <https://www.polidea.com/>
>> > > >
>> > >
>> >
>>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message