airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Driesprong, Fokko" <fo...@driesprong.frl>
Subject Re: Travis builds in a queue for hours
Date Thu, 11 Jul 2019 07:26:28 GMT
Yes, Gitlab works very well with GCP. A Kubernetes cluster with autoscaling
for the runners would be perfect, and will also minimize the resources
provided by Google.

Cheers, Fokko

Op do 11 jul. 2019 om 07:13 schreef Jarek Potiuk <Jarek.Potiuk@polidea.com>

> Since more than few people (including myself) are in favour of GitLab CI,
> and since Apache Infra is talking to GitLab CI, I will make sure to check
> if we can combine the two approaches - workers from Google and managed,
> central GitlabCI interface to manage it (likely managed by the Infra team).
> Airflow can easily be a  "guinea pig" for GitLab CI / Apache integration.
> We also have quite an expertise in managin GitLab in my company (we use
> GitLab in Polidea for most of our mobile project CI and all the cloud
> builds that we run internally).
>
> I will make an AIP for that soon and involve the right people :).
>
> J.
>
> On Thu, Jul 11, 2019 at 8:01 AM Driesprong, Fokko <fokko@driesprong.frl>
> wrote:
>
> > Regardings the numbers, I believe that INFRA has an overview of the usage
> > per project. I think they are happy to share these numbers with you.
> Also,
> > it seems like there is also a queue in Jenkins:
> https://status.apache.org/
> >
> > Talking about Jenkins. I'm not a big fan of it. For example, Spark uses
> it,
> > and it is rather difficult to set up the environment yourself, in
> contrast
> > with Travis. I also have good experiences with Gitlab since that is the
> > only Docker native CI in my personal opinion.
> >
> > > But we can try both of course. And even switch later.
> > There is nothing as permanent as a temporary solution :-) However, I'm
> not
> > against trying. I've checked the beam project, and the integration with
> > Github looks good.
> >
> > Thanks again Jarek and Aizhamal for all the work an effort.
> >
> > Cheers, Fokko
> >
> >
> >
> >
> > Op wo 10 jul. 2019 om 23:11 schreef Aizhamal Nurmamat kyzy <
> > aizhamal@apache.org>:
> >
> > > Hi all,
> > >
> > > I am still working on trying to get approvals for this, so this is not
> > yet
> > > a done deal. I'll keep y'all updated.
> > >
> > > As for the CI solution to use, we have no particular inclination. As
> long
> > > as the community supports it, and it is consistent with any Apache
> > > guidelines for CI from their projects. Jenkins and GitLab CI both sound
> > > sensible.
> > >
> > > The email from INFRA says that Airflow runs 2600 hours of tests per
> > month,
> > > or the equivalent of about 4 machines. Can the community help with a
> > > reasonable estimate for this, so I can use it as a reference for the
> > > request?
> > >
> > > Thanks!
> > >
> > > On Wed, Jul 10, 2019 at 2:43 PM Jarek Potiuk <Jarek.Potiuk@polidea.com
> >
> > > wrote:
> > >
> > > > Yeah. Gitlab CI is definitely what I would prefer as well from the
> > > > "modernity" point of view (and one of my very close friends is Gitlab
> > CI
> > > > maintainer and actually The person who introduced CI to GitLab
> > > offering). I
> > > > also actually already catalysed discussion between GitLab and Apache
> > > > infrastructure to introduce GitLab CI on the "Apache" level (they are
> > > > talking about it now I believe).
> > > >
> > > > But from Google <> Apache/Procedural point of view it might simply
be
> > > > easier to follow footsteps of Apache Beam. It might simply be few
> > clicks
> > > > away for the Apache Infrastructure to add more machines and connect
> > them
> > > to
> > > > the Apache Jenkins for our project. If we have a path cleared by
> > others,
> > > > following it might be simply much faster.
> > > >
> > > > But we can try both of course. And even switch later. The Docker CI
> > > > approach I am about to merge is designed to be super-easy to switch
> > > betwen
> > > > CI systems. Virtually ALL the build logic is in scripts  in shared
> > Docker
> > > > images. There is basically one file per CI system to add and we can
> > > support
> > > > Travis/Jenkins/CloudBuild/CircleCI - whatever we imaging. We can even
> > > > support all of them at the same time :)
> > > >
> > > > J.
> > > >
> > > > On Wed, Jul 10, 2019 at 11:32 PM Bolke de Bruin <bdbruin@gmail.com>
> > > wrote:
> > > >
> > > > > If you need an alternative why not use a couple of gitlab-ci
> runners?
> > > > Much
> > > > > easier to maintain, light weight, and much closer to what we use
> now.
> > > > >
> > > > > B.
> > > > >
> > > > > Verstuurd vanaf mijn iPad
> > > > >
> > > > > > Op 10 jul. 2019 om 23:27 heeft Bolke de Bruin <bdbruin@gmail.com
> >
> > > het
> > > > > volgende geschreven:
> > > > > >
> > > > > > Awesome! But I hope you are not serious about using Jenkins
> right?
> > > If I
> > > > > need to start a Holy War it would be against Jenkins.
> > > > > >
> > > > > > B.
> > > > > >
> > > > > > Verstuurd vanaf mijn iPad
> > > > > >
> > > > > >> Op 10 jul. 2019 om 22:55 heeft Jarek Potiuk <
> > > Jarek.Potiuk@polidea.com
> > > > >
> > > > > het volgende geschreven:
> > > > > >>
> > > > > >> Hello Everyone,
> > > > > >>
> > > > > >> I have some really good news. I just had a call with Google
OSS
> > team
> > > > > (Gris,
> > > > > >> Aizhamal) and they are willing to donate VMs on Google Cloud
> > > Platform
> > > > to
> > > > > >> run CI for Airflow. In order to simplify the setup (and
make
> sure
> > it
> > > > is
> > > > > ok
> > > > > >> according to Apache regulations) we think we should go exactly
> the
> > > > same
> > > > > >> route as Apache Beam project (Google donated 16x 16CPU machines
> > for
> > > > > them).
> > > > > >> The route of Apache Beam is to use the machines as workers
for
> > > Apache
> > > > > >> Jenkins (https://builds.apache.org/). Apache Jenkins is
one of
> > the
> > > > > >> encouraged CI solutions by Apache and if we can have workers
> > > connected
> > > > > to
> > > > > >> the existing Jenkins master of Apache, it means that the
> > maintenance
> > > > > >> overhead will be pretty minimal. And we can follow Apache
Beam
> > setup
> > > > so
> > > > > I
> > > > > >> do not expect any legal problems.
> > > > > >>
> > > > > >> I also work very closely with the team that uses Apache
Beam
> > Jenkins
> > > > > >> heavily so I have access to all the necessary experts to
help
> with
> > > the
> > > > > >> setup (and I am happy to help with that).
> > > > > >>
> > > > > >> I really hope everyone in the community will be really happy
to
> go
> > > in
> > > > > that
> > > > > >> direction - it's. Please let me know if you have any concerns
!
> > > > > >>
> > > > > >> We do not need as many machines as Beam for sure (Beam uses
the
> > > > > machines to
> > > > > >> process a lot of data for tests including some load testing)
but
> > we
> > > > > need to
> > > > > >> estimate the number/types of machines that we are going
to need.
> > > > > >> Fokko, Ash, others - do you have some recent numbers for
the
> > current
> > > > > usage
> > > > > >> or should I open an Infrastructure ticket for it?
> > > > > >>
> > > > > >> J
> > > > > >>
> > > > > >> On Fri, Jun 28, 2019 at 4:50 PM Jarek Potiuk <
> > > > Jarek.Potiuk@polidea.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Thanks Aizhamal! I spoke already to Gris and she confirmed
that
> > as
> > > > well
> > > > > >>> and the 8th of July date is ok for us as we will have
to
> evaluate
> > > and
> > > > > >>> prepare as well. Have a nice trip.
> > > > > >>>
> > > > > >>> J.
> > > > > >>>
> > > > > >>> On Fri, Jun 28, 2019 at 4:25 PM Aizhamal Nurmamat kyzy
> > > > > >>> <aizhamal@google.com.invalid> wrote:
> > > > > >>>
> > > > > >>>> Hi all,
> > > > > >>>>
> > > > > >>>> On Thu, Jun 27, 2019 at 15:28 Jarek Potiuk <
> > > > Jarek.Potiuk@polidea.com>
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>>> Yeah. I also have a working version of Cloud
build
> > configuration
> > > > and
> > > > > we
> > > > > >>>> can
> > > > > >>>>> run the tests on cloud build if we can get some
credits from
> > > > Google.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> I can look into getting a small amount of credits
approved for
> > > this,
> > > > > to
> > > > > >>>> see
> > > > > >>>> if it’s useful to offload some tests to Cloud
Build, or to
> > > provision
> > > > > some
> > > > > >>>> VMs to run on Apache Infra.
> > > > > >>>>
> > > > > >>>> I am traveling at the moment, but I’ll be back
in the office
> on
> > > July
> > > > > 8,
> > > > > >>>> and
> > > > > >>>> I’ll try to get this done.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> Thanks,
> > > > > >>>> Aizhamal
> > > > > >>>>
> > > > > >>>> And
> > > > > >>>>> the changes from the upcoming CI image will
make it much
> easier
> > > to
> > > > > run
> > > > > >>>>> tests on any CI provider. Except Kubernetes
tests they are
> > pretty
> > > > > much
> > > > > >>>>> CI-agnostic. Kubernetes tests will likely be
also fixed soon.
> > > > > >>>>>
> > > > > >>>>> Another idea: I thought that in the future we
can also run
> only
> > > > > subset
> > > > > >>>> of
> > > > > >>>>> postgres/mysql/sqlite tests on all combinations.
I think
> there
> > > are
> > > > > just
> > > > > >>>>> handful of tests that are specific for backend
(and we
> already
> > > know
> > > > > >>>> which
> > > > > >>>>> ones they are - they are skipped-if).
> > > > > >>>>>
> > > > > >>>>> J.
> > > > > >>>>>
> > > > > >>>>> Principal Software Engineer
> > > > > >>>>> Phone: +48660796129
> > > > > >>>>>
> > > > > >>>>> czw., 27 cze 2019, 15:12 użytkownik Philippe
Gagnon <
> > > > > >>>> philgagnon1@gmail.com
> > > > > >>>>>>
> > > > > >>>>> napisał:
> > > > > >>>>>
> > > > > >>>>>> I think the combinations that you are proposing
are sensible
> > for
> > > > > >>>>> pre-merge
> > > > > >>>>>> checks.
> > > > > >>>>>>
> > > > > >>>>>> I am working on a proposal to offload extra
combinations to
> > > > another
> > > > > CI
> > > > > >>>>>> provider (Azure DevOps specifically seems
like a good
> > > candidate),
> > > > > >>>> either
> > > > > >>>>>> pre or post merge. Ideally I'd like to run
more combinations
> > > > > pre-merge
> > > > > >>>>> but
> > > > > >>>>>> there is a trade-off to be conscious of
here between
> > development
> > > > > >>>> velocity
> > > > > >>>>>> and quality assurance, which I think this
issue highlights
> > quite
> > > > > well.
> > > > > >>>>>>
> > > > > >>>>>> Please let me know your thoughts
> > > > > >>>>>>
> > > > > >>>>>> Philippe
> > > > > >>>>>>
> > > > > >>>>>> On Thu, Jun 27, 2019 at 9:05 AM Jarek Potiuk
<
> > > > > >>>> Jarek.Potiuk@polidea.com>
> > > > > >>>>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>>> Agree that we should be thoughtful about
others as well: In
> > the
> > > > > >>>> latest
> > > > > >>>>>> push
> > > > > >>>>>>> (few minutes ago) of the upcoming official
CI image i
> > > implemented
> > > > > >>>> the
> > > > > >>>>>>> change we discussed in the Github where
we limit the number
> > of
> > > > > >>>>>> combinations
> > > > > >>>>>>> we test:
> > > > > >>>>>>>
> > > > > >>>>>>> You can see it yourself:
> > > > > >>>>>>> https://travis-ci.org/apache/airflow/builds/551305240
> > > > > >>>>>>>
> > > > > >>>>>>> Those are the combinations I propose:
> > > > > >>>>>>>
> > > > > >>>>>>> Python: 3.6
> > > > > >>>>>>> BACKEND=mysql ENV=docker
> > > > > >>>>>>>
> > > > > >>>>>>> Python: 3.6
> > > > > >>>>>>> BACKEND=postgres ENV=docker
> > > > > >>>>>>>
> > > > > >>>>>>> Python: 3.5
> > > > > >>>>>>> BACKEND=sqlite ENV=docker
> > > > > >>>>>>>
> > > > > >>>>>>> Python: 3.6
> > > > > >>>>>>> BACKEND=postgres ENV=kubernetes KUBERNETES_VERSION=v1.13.0
> > > > > >>>>>>>
> > > > > >>>>>>> J,
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> On Thu, Jun 27, 2019 at 11:00 AM Driesprong,
Fokko
> > > > > >>>>> <fokko@driesprong.frl
> > > > > >>>>>>>
> > > > > >>>>>>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> We got this message last year:
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Hello, Airflow PPMC.
> > > > > >>>>>>>>> While going through the usage
statistics for our Travis
> CI
> > > > > >>>>> service, I
> > > > > >>>>>>>>> have noticed that the Airflow
project is using an
> > abnormally
> > > > > >>>> large
> > > > > >>>>>>>>> amount of resources, 2600 hours
per month or the
> equivalent
> > > of
> > > > > >>>>> having
> > > > > >>>>>>>>> almost 4 machines building airflow
non-stop 24/7. As this
> > is
> > > > not
> > > > > >>>>>> free,
> > > > > >>>>>>>>> but rather costing us money,
I'm contacting you with the
> > > > > >>>> intention
> > > > > >>>>> of
> > > > > >>>>>>>>> figuring out ways to reduce
the use of Travis for the
> > > project.
> > > > > >>>>>>>>
> > > > > >>>>>>>>> We would greatly prefer that
the project itself comes up
> > > with a
> > > > > >>>>>>> solution
> > > > > >>>>>>>>> to lower the usage of Travis,
as we'd hate to simply turn
> > it
> > > > off
> > > > > >>>>> for
> > > > > >>>>>>>>> you, but the usage is at a rather
severe level, totaling
> > more
> > > > > >>>> than
> > > > > >>>>>> 21%
> > > > > >>>>>>>>> of the total build time of all
projects using Travis, so
> > > > > >>>> something
> > > > > >>>>>>>>> actionable should be decided
upon and (preferably)
> > completed
> > > by
> > > > > >>>> the
> > > > > >>>>>> end
> > > > > >>>>>>>>> of May that will reduce the
consumption of Travis
> > resources.
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Alternately, if you are unable
to lower the pressure on
> > > Travis,
> > > > > >>>> the
> > > > > >>>>>>>>> podling and/or IPMC may ask
the board of directors for a
> > > > > >>>> separate
> > > > > >>>>>>> budget
> > > > > >>>>>>>>> for additional build nodes to
cope with the added load -
> > I'll
> > > > > >>>> leave
> > > > > >>>>>>> this
> > > > > >>>>>>>>> for the podling and IPMC to
decide on.
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Please let us know when you
have decided on a plan to
> > remedy
> > > > > >>>> this
> > > > > >>>>>>>> situation.
> > > > > >>>>>>>>
> > > > > >>>>>>>>> With regards,
> > > > > >>>>>>>>> Daniel on behalf of ASF Infrastructure.
> > > > > >>>>>>>>
> > > > > >>>>>>>> I think more and more projects are
still migrating to the
> > ASF
> > > > > >>>> Travis,
> > > > > >>>>>> so
> > > > > >>>>>>> I
> > > > > >>>>>>>> think natural that there is more
load. However, this still
> > > > leaves
> > > > > >>>> the
> > > > > >>>>>>>> question if we have to run the full
matrix.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Cheers, Fokko
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> Op do 27 jun. 2019 om 10:56 schreef
Jarek Potiuk <
> > > > > >>>>>>> Jarek.Potiuk@polidea.com
> > > > > >>>>>>>>> :
> > > > > >>>>>>>>
> > > > > >>>>>>>>> I think we should really involve
infra to increase the
> slot
> > > > > >>>> number
> > > > > >>>>> or
> > > > > >>>>>>>> maybe
> > > > > >>>>>>>>> even somehow allocate slots
per project.
> > > > > >>>>>>>>> The problem is that we cannot
control what other apache
> > > > projects
> > > > > >>>>> are
> > > > > >>>>>>>> doing,
> > > > > >>>>>>>>> so even if we decrease our runtime,
it's the other
> projects
> > > > that
> > > > > >>>>>> might
> > > > > >>>>>>>> hold
> > > > > >>>>>>>>> us in the queue :(
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> J.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Thu, Jun 27, 2019 at 10:19
AM Driesprong, Fokko
> > > > > >>>>>>> <fokko@driesprong.frl
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>> I've noticed this at other
Apache projects as well,
> > > sometimes
> > > > > >>>> it
> > > > > >>>>>>> takes
> > > > > >>>>>>>> up
> > > > > >>>>>>>>>> to 7-8 hours. The only thing
we can do, is reduce the
> > > runtime
> > > > > >>>> of
> > > > > >>>>>> the
> > > > > >>>>>>>> jobs
> > > > > >>>>>>>>>> so we take less slots :-)
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Cheers, Fokko
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Op wo 26 jun. 2019 om 21:59
schreef Jarek Potiuk <
> > > > > >>>>>>>>> Jarek.Potiuk@polidea.com
> > > > > >>>>>>>>>>> :
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>> Yep. That's what I suggested
as the reason in the
> ticket
> > -
> > > I
> > > > > >>>>>> guess
> > > > > >>>>>>>>> INFRA
> > > > > >>>>>>>>>>> are the only people
who can do anything about it
> > (increase
> > > > > >>>>>>>> concurrency
> > > > > >>>>>>>>> ?
> > > > > >>>>>>>>>>> pay more for Travis
:)? ).
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> On Wed, Jun 26, 2019
at 9:51 PM Ash Berlin-Taylor <
> > > > > >>>>>> ash@apache.org>
> > > > > >>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> I asked Travis on
twitter and they said it was due to
> > the
> > > > > >>>>>> Apache
> > > > > >>>>>>>>> other
> > > > > >>>>>>>>>>>> projects build queues
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> https://twitter.com/travisci/status/1143893051460526080
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> -ash
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> On 26 June 2019
20:48:33 BST, Jarek Potiuk <
> > > > > >>>>>>>> Jarek.Potiuk@polidea.com
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Hello everyone,
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> For the last
few days the Travis builds for
> > > > > >>>> apache/airflow
> > > > > >>>>>>> project
> > > > > >>>>>>>>> are
> > > > > >>>>>>>>>>>>> waiting in a
queue for hours. This is not a normal
> > > > > >>>>> situation.
> > > > > >>>>>>> I've
> > > > > >>>>>>>>>>> opened
> > > > > >>>>>>>>>>>>> INFRA ticket
for that:
> > > > > >>>>>>>>>>> https://issues.apache.org/jira/browse/INFRA-18657
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> J.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> --
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Jarek Potiuk
> > > > > >>>>>>>>>>> Polidea <https://www.polidea.com/>
| Principal
> Software
> > > > > >>>>> Engineer
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> M: +48 660 796 129 <+48660796129>
> > > > > >>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> --
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Jarek Potiuk
> > > > > >>>>>>>>> Polidea <https://www.polidea.com/>
| Principal Software
> > > > > >>>> Engineer
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> M: +48 660 796 129 <+48660796129>
> > > > > >>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> --
> > > > > >>>>>>>
> > > > > >>>>>>> Jarek Potiuk
> > > > > >>>>>>> Polidea <https://www.polidea.com/>
| Principal Software
> > > Engineer
> > > > > >>>>>>>
> > > > > >>>>>>> M: +48 660 796 129 <+48660796129>
> > > > > >>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>>
> > > > > >>> --
> > > > > >>>
> > > > > >>> Jarek Potiuk
> > > > > >>> Polidea <https://www.polidea.com/> | Principal
Software
> Engineer
> > > > > >>>
> > > > > >>> M: +48 660 796 129 <+48660796129>
> > > > > >>> [image: Polidea] <https://www.polidea.com/>
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > > >> --
> > > > > >>
> > > > > >> Jarek Potiuk
> > > > > >> Polidea <https://www.polidea.com/> | Principal Software
> Engineer
> > > > > >>
> > > > > >> M: +48 660 796 129 <+48660796129>
> > > > > >> [image: Polidea] <https://www.polidea.com/>
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/>
> > > >
> > >
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message