airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aizhamal Nurmamat kyzy <aizha...@apache.org>
Subject Re: Travis builds in a queue for hours
Date Wed, 10 Jul 2019 22:11:02 GMT
Hi all,

I am still working on trying to get approvals for this, so this is not yet
a done deal. I'll keep y'all updated.

As for the CI solution to use, we have no particular inclination. As long
as the community supports it, and it is consistent with any Apache
guidelines for CI from their projects. Jenkins and GitLab CI both sound
sensible.

The email from INFRA says that Airflow runs 2600 hours of tests per month,
or the equivalent of about 4 machines. Can the community help with a
reasonable estimate for this, so I can use it as a reference for the
request?

Thanks!

On Wed, Jul 10, 2019 at 2:43 PM Jarek Potiuk <Jarek.Potiuk@polidea.com>
wrote:

> Yeah. Gitlab CI is definitely what I would prefer as well from the
> "modernity" point of view (and one of my very close friends is Gitlab CI
> maintainer and actually The person who introduced CI to GitLab offering). I
> also actually already catalysed discussion between GitLab and Apache
> infrastructure to introduce GitLab CI on the "Apache" level (they are
> talking about it now I believe).
>
> But from Google <> Apache/Procedural point of view it might simply be
> easier to follow footsteps of Apache Beam. It might simply be few clicks
> away for the Apache Infrastructure to add more machines and connect them to
> the Apache Jenkins for our project. If we have a path cleared by others,
> following it might be simply much faster.
>
> But we can try both of course. And even switch later. The Docker CI
> approach I am about to merge is designed to be super-easy to switch betwen
> CI systems. Virtually ALL the build logic is in scripts  in shared Docker
> images. There is basically one file per CI system to add and we can support
> Travis/Jenkins/CloudBuild/CircleCI - whatever we imaging. We can even
> support all of them at the same time :)
>
> J.
>
> On Wed, Jul 10, 2019 at 11:32 PM Bolke de Bruin <bdbruin@gmail.com> wrote:
>
> > If you need an alternative why not use a couple of gitlab-ci runners?
> Much
> > easier to maintain, light weight, and much closer to what we use now.
> >
> > B.
> >
> > Verstuurd vanaf mijn iPad
> >
> > > Op 10 jul. 2019 om 23:27 heeft Bolke de Bruin <bdbruin@gmail.com> het
> > volgende geschreven:
> > >
> > > Awesome! But I hope you are not serious about using Jenkins right? If I
> > need to start a Holy War it would be against Jenkins.
> > >
> > > B.
> > >
> > > Verstuurd vanaf mijn iPad
> > >
> > >> Op 10 jul. 2019 om 22:55 heeft Jarek Potiuk <Jarek.Potiuk@polidea.com
> >
> > het volgende geschreven:
> > >>
> > >> Hello Everyone,
> > >>
> > >> I have some really good news. I just had a call with Google OSS team
> > (Gris,
> > >> Aizhamal) and they are willing to donate VMs on Google Cloud Platform
> to
> > >> run CI for Airflow. In order to simplify the setup (and make sure it
> is
> > ok
> > >> according to Apache regulations) we think we should go exactly the
> same
> > >> route as Apache Beam project (Google donated 16x 16CPU machines for
> > them).
> > >> The route of Apache Beam is to use the machines as workers for Apache
> > >> Jenkins (https://builds.apache.org/). Apache Jenkins is one of the
> > >> encouraged CI solutions by Apache and if we can have workers connected
> > to
> > >> the existing Jenkins master of Apache, it means that the maintenance
> > >> overhead will be pretty minimal. And we can follow Apache Beam setup
> so
> > I
> > >> do not expect any legal problems.
> > >>
> > >> I also work very closely with the team that uses Apache Beam Jenkins
> > >> heavily so I have access to all the necessary experts to help with the
> > >> setup (and I am happy to help with that).
> > >>
> > >> I really hope everyone in the community will be really happy to go in
> > that
> > >> direction - it's. Please let me know if you have any concerns !
> > >>
> > >> We do not need as many machines as Beam for sure (Beam uses the
> > machines to
> > >> process a lot of data for tests including some load testing) but we
> > need to
> > >> estimate the number/types of machines that we are going to need.
> > >> Fokko, Ash, others - do you have some recent numbers for the current
> > usage
> > >> or should I open an Infrastructure ticket for it?
> > >>
> > >> J
> > >>
> > >> On Fri, Jun 28, 2019 at 4:50 PM Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> > >> wrote:
> > >>
> > >>> Thanks Aizhamal! I spoke already to Gris and she confirmed that as
> well
> > >>> and the 8th of July date is ok for us as we will have to evaluate and
> > >>> prepare as well. Have a nice trip.
> > >>>
> > >>> J.
> > >>>
> > >>> On Fri, Jun 28, 2019 at 4:25 PM Aizhamal Nurmamat kyzy
> > >>> <aizhamal@google.com.invalid> wrote:
> > >>>
> > >>>> Hi all,
> > >>>>
> > >>>> On Thu, Jun 27, 2019 at 15:28 Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> > >>>> wrote:
> > >>>>
> > >>>>> Yeah. I also have a working version of Cloud build configuration
> and
> > we
> > >>>> can
> > >>>>> run the tests on cloud build if we can get some credits from
> Google.
> > >>>>
> > >>>>
> > >>>> I can look into getting a small amount of credits approved for
this,
> > to
> > >>>> see
> > >>>> if it’s useful to offload some tests to Cloud Build, or to provision
> > some
> > >>>> VMs to run on Apache Infra.
> > >>>>
> > >>>> I am traveling at the moment, but I’ll be back in the office
on July
> > 8,
> > >>>> and
> > >>>> I’ll try to get this done.
> > >>>>
> > >>>>
> > >>>> Thanks,
> > >>>> Aizhamal
> > >>>>
> > >>>> And
> > >>>>> the changes from the upcoming CI image will make it much easier
to
> > run
> > >>>>> tests on any CI provider. Except Kubernetes tests they are
pretty
> > much
> > >>>>> CI-agnostic. Kubernetes tests will likely be also fixed soon.
> > >>>>>
> > >>>>> Another idea: I thought that in the future we can also run
only
> > subset
> > >>>> of
> > >>>>> postgres/mysql/sqlite tests on all combinations. I think there
are
> > just
> > >>>>> handful of tests that are specific for backend (and we already
know
> > >>>> which
> > >>>>> ones they are - they are skipped-if).
> > >>>>>
> > >>>>> J.
> > >>>>>
> > >>>>> Principal Software Engineer
> > >>>>> Phone: +48660796129
> > >>>>>
> > >>>>> czw., 27 cze 2019, 15:12 użytkownik Philippe Gagnon <
> > >>>> philgagnon1@gmail.com
> > >>>>>>
> > >>>>> napisał:
> > >>>>>
> > >>>>>> I think the combinations that you are proposing are sensible
for
> > >>>>> pre-merge
> > >>>>>> checks.
> > >>>>>>
> > >>>>>> I am working on a proposal to offload extra combinations
to
> another
> > CI
> > >>>>>> provider (Azure DevOps specifically seems like a good candidate),
> > >>>> either
> > >>>>>> pre or post merge. Ideally I'd like to run more combinations
> > pre-merge
> > >>>>> but
> > >>>>>> there is a trade-off to be conscious of here between development
> > >>>> velocity
> > >>>>>> and quality assurance, which I think this issue highlights
quite
> > well.
> > >>>>>>
> > >>>>>> Please let me know your thoughts
> > >>>>>>
> > >>>>>> Philippe
> > >>>>>>
> > >>>>>> On Thu, Jun 27, 2019 at 9:05 AM Jarek Potiuk <
> > >>>> Jarek.Potiuk@polidea.com>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Agree that we should be thoughtful about others as
well: In the
> > >>>> latest
> > >>>>>> push
> > >>>>>>> (few minutes ago) of the upcoming official CI image
i implemented
> > >>>> the
> > >>>>>>> change we discussed in the Github where we limit the
number of
> > >>>>>> combinations
> > >>>>>>> we test:
> > >>>>>>>
> > >>>>>>> You can see it yourself:
> > >>>>>>> https://travis-ci.org/apache/airflow/builds/551305240
> > >>>>>>>
> > >>>>>>> Those are the combinations I propose:
> > >>>>>>>
> > >>>>>>> Python: 3.6
> > >>>>>>> BACKEND=mysql ENV=docker
> > >>>>>>>
> > >>>>>>> Python: 3.6
> > >>>>>>> BACKEND=postgres ENV=docker
> > >>>>>>>
> > >>>>>>> Python: 3.5
> > >>>>>>> BACKEND=sqlite ENV=docker
> > >>>>>>>
> > >>>>>>> Python: 3.6
> > >>>>>>> BACKEND=postgres ENV=kubernetes KUBERNETES_VERSION=v1.13.0
> > >>>>>>>
> > >>>>>>> J,
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Thu, Jun 27, 2019 at 11:00 AM Driesprong, Fokko
> > >>>>> <fokko@driesprong.frl
> > >>>>>>>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> We got this message last year:
> > >>>>>>>>
> > >>>>>>>>> Hello, Airflow PPMC.
> > >>>>>>>>> While going through the usage statistics for
our Travis CI
> > >>>>> service, I
> > >>>>>>>>> have noticed that the Airflow project is using
an abnormally
> > >>>> large
> > >>>>>>>>> amount of resources, 2600 hours per month or
the equivalent of
> > >>>>> having
> > >>>>>>>>> almost 4 machines building airflow non-stop
24/7. As this is
> not
> > >>>>>> free,
> > >>>>>>>>> but rather costing us money, I'm contacting
you with the
> > >>>> intention
> > >>>>> of
> > >>>>>>>>> figuring out ways to reduce the use of Travis
for the project.
> > >>>>>>>>
> > >>>>>>>>> We would greatly prefer that the project itself
comes up with a
> > >>>>>>> solution
> > >>>>>>>>> to lower the usage of Travis, as we'd hate
to simply turn it
> off
> > >>>>> for
> > >>>>>>>>> you, but the usage is at a rather severe level,
totaling more
> > >>>> than
> > >>>>>> 21%
> > >>>>>>>>> of the total build time of all projects using
Travis, so
> > >>>> something
> > >>>>>>>>> actionable should be decided upon and (preferably)
completed by
> > >>>> the
> > >>>>>> end
> > >>>>>>>>> of May that will reduce the consumption of
Travis resources.
> > >>>>>>>>
> > >>>>>>>>> Alternately, if you are unable to lower the
pressure on Travis,
> > >>>> the
> > >>>>>>>>> podling and/or IPMC may ask the board of directors
for a
> > >>>> separate
> > >>>>>>> budget
> > >>>>>>>>> for additional build nodes to cope with the
added load - I'll
> > >>>> leave
> > >>>>>>> this
> > >>>>>>>>> for the podling and IPMC to decide on.
> > >>>>>>>>
> > >>>>>>>>> Please let us know when you have decided on
a plan to remedy
> > >>>> this
> > >>>>>>>> situation.
> > >>>>>>>>
> > >>>>>>>>> With regards,
> > >>>>>>>>> Daniel on behalf of ASF Infrastructure.
> > >>>>>>>>
> > >>>>>>>> I think more and more projects are still migrating
to the ASF
> > >>>> Travis,
> > >>>>>> so
> > >>>>>>> I
> > >>>>>>>> think natural that there is more load. However,
this still
> leaves
> > >>>> the
> > >>>>>>>> question if we have to run the full matrix.
> > >>>>>>>>
> > >>>>>>>> Cheers, Fokko
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Op do 27 jun. 2019 om 10:56 schreef Jarek Potiuk
<
> > >>>>>>> Jarek.Potiuk@polidea.com
> > >>>>>>>>> :
> > >>>>>>>>
> > >>>>>>>>> I think we should really involve infra to increase
the slot
> > >>>> number
> > >>>>> or
> > >>>>>>>> maybe
> > >>>>>>>>> even somehow allocate slots per project.
> > >>>>>>>>> The problem is that we cannot control what
other apache
> projects
> > >>>>> are
> > >>>>>>>> doing,
> > >>>>>>>>> so even if we decrease our runtime, it's the
other projects
> that
> > >>>>>> might
> > >>>>>>>> hold
> > >>>>>>>>> us in the queue :(
> > >>>>>>>>>
> > >>>>>>>>> J.
> > >>>>>>>>>
> > >>>>>>>>> On Thu, Jun 27, 2019 at 10:19 AM Driesprong,
Fokko
> > >>>>>>> <fokko@driesprong.frl
> > >>>>>>>>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> I've noticed this at other Apache projects
as well, sometimes
> > >>>> it
> > >>>>>>> takes
> > >>>>>>>> up
> > >>>>>>>>>> to 7-8 hours. The only thing we can do,
is reduce the runtime
> > >>>> of
> > >>>>>> the
> > >>>>>>>> jobs
> > >>>>>>>>>> so we take less slots :-)
> > >>>>>>>>>>
> > >>>>>>>>>> Cheers, Fokko
> > >>>>>>>>>>
> > >>>>>>>>>> Op wo 26 jun. 2019 om 21:59 schreef Jarek
Potiuk <
> > >>>>>>>>> Jarek.Potiuk@polidea.com
> > >>>>>>>>>>> :
> > >>>>>>>>>>
> > >>>>>>>>>>> Yep. That's what I suggested as the
reason in the ticket - I
> > >>>>>> guess
> > >>>>>>>>> INFRA
> > >>>>>>>>>>> are the only people who can do anything
about it (increase
> > >>>>>>>> concurrency
> > >>>>>>>>> ?
> > >>>>>>>>>>> pay more for Travis :)? ).
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, Jun 26, 2019 at 9:51 PM Ash
Berlin-Taylor <
> > >>>>>> ash@apache.org>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> I asked Travis on twitter and they
said it was due to the
> > >>>>>> Apache
> > >>>>>>>>> other
> > >>>>>>>>>>>> projects build queues
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> https://twitter.com/travisci/status/1143893051460526080
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> -ash
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On 26 June 2019 20:48:33 BST, Jarek
Potiuk <
> > >>>>>>>> Jarek.Potiuk@polidea.com
> > >>>>>>>>>>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Hello everyone,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> For the last few days the Travis
builds for
> > >>>> apache/airflow
> > >>>>>>> project
> > >>>>>>>>> are
> > >>>>>>>>>>>>> waiting in a queue for hours.
This is not a normal
> > >>>>> situation.
> > >>>>>>> I've
> > >>>>>>>>>>> opened
> > >>>>>>>>>>>>> INFRA ticket for that:
> > >>>>>>>>>>> https://issues.apache.org/jira/browse/INFRA-18657
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> J.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> --
> > >>>>>>>>>>>
> > >>>>>>>>>>> Jarek Potiuk
> > >>>>>>>>>>> Polidea <https://www.polidea.com/>
| Principal Software
> > >>>>> Engineer
> > >>>>>>>>>>>
> > >>>>>>>>>>> M: +48 660 796 129 <+48660796129>
> > >>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>>
> > >>>>>>>>> Jarek Potiuk
> > >>>>>>>>> Polidea <https://www.polidea.com/> |
Principal Software
> > >>>> Engineer
> > >>>>>>>>>
> > >>>>>>>>> M: +48 660 796 129 <+48660796129>
> > >>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>>
> > >>>>>>> Jarek Potiuk
> > >>>>>>> Polidea <https://www.polidea.com/> | Principal
Software Engineer
> > >>>>>>>
> > >>>>>>> M: +48 660 796 129 <+48660796129>
> > >>>>>>> [image: Polidea] <https://www.polidea.com/>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>> Jarek Potiuk
> > >>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>>
> > >>> M: +48 660 796 129 <+48660796129>
> > >>> [image: Polidea] <https://www.polidea.com/>
> > >>>
> > >>>
> > >>
> > >> --
> > >>
> > >> Jarek Potiuk
> > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>
> > >> M: +48 660 796 129 <+48660796129>
> > >> [image: Polidea] <https://www.polidea.com/>
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message