airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Potiuk <Jarek.Pot...@polidea.com>
Subject Re: AIP-7 Simplified development workflow
Date Thu, 03 Jan 2019 10:14:02 GMT
Hello everyone,

I am really, really happy to help with that as it has been focus of my
attention for the last couple of months in our team at Polidea.
Maybe we can use what we have done for our own development environment for
Airflow for Google Cloud Platform.

We are ready to share what we have done and contribute to Apache in
whatever form is appropriate. Either incorporating parts of what we've done
or (possibly) using what we've done as starting point and adding what's
missing from the current TravisCI setup. I think the latter will be far
easier and faster - but it's just my opinion as I know it very well now :).

Last few months in Polidea (my company) we developed (and contributed to
Airflow's contrib) more than 30 Google Cloud Platform related operators and
a number of bugfixes to the core Airflow. We worked as a team (3 people)
and we created pretty complete and sophisticated, very well documented
development environment to be more productive and to work as a team. We are
going to add 40 more operators and add new team members in the coming
months so we had to be productive :).

You can find our environment here:
https://github.com/PolideaInternal/airflow-breeze - we call it '*Airflow
breeze*' like in *"it's a breeze to work with Aiflow and GCP"*. t's
targeted to make our work easier for Google Cloud Platform operators
development but it has many things implemented that you are talking about:

*Supported features:*

   - Simplified, nicely layered and optimised for speed (especially
   cassandra driver) of building Dockerfile
   <https://github.com/PolideaInternal/airflow-breeze/blob/master/Dockerfile>
   that supports three python versions - 2.7, 3.5 (used in Google Composer)
   and 3.6.  Note that there are many problems with compatibility between 3.5
   and 3.6 so we introduced all three versions.
   - Google Cloud Build CI scripts for cloud build are already part of the
   image (similarly as suggested for Travis CI ones).
   - We dropped *tox* support in favour of Google Cloud Build parallel
   builds with separate docker containers.
   - We have a built-in support for unique naming of resources so that
   multiple builds
   - We have automation of local environment (virtualenvs) for running some
   unit and system tests locally - not only via docker container (which makes
   it far easier for debugging) - for example using local IDE
   - Documentation how to work with unit tests
   <https://github.com/PolideaInternal/airflow-breeze/blob/master/README.unittests.md>
    and system tests
   <https://github.com/PolideaInternal/airflow-breeze/blob/master/README.unittests.md>
(see
   below for system tests description) - including description on how to
   integrate with IntelliJ/Pycharm and work efficiently with debugging -
   including remote debugging of environment (includes some screenshots).
   - Support for automated Cloud Build and system tests
   - Nice, documented ./run_environment.sh
   <https://github.com/PolideaInternal/airflow-breeze#appendix-current-run_environment-flags>
script
   that supports image building/uplod/download from registry, choosing GCP
   project id and Service account keys, support for multiple workspaces,
   -  Prerequisites, setting up and bootstrapping the local project frpm
   scratch
   <https://github.com/PolideaInternal/airflow-breeze/blob/master/README.setup.md>
-
   documentation + automation of checkout of the project and shared team
   configuration - that includes documentation on how to configure your local
   virtualenvs and manage docker image and the whole environment
   - The Dockerfile and ./run_environment.sh is built in the way that local
   sources are shared with the Docker container so you can edit your sources
   while running the tests in the container. Super helpful for fast
   development cycle.
   - A number of nice development nice small features - such as bash
   history support in docker, automated setting of common configuration
   variables shared between the team etc

*What's missing:*

   - What is missing comparing to the current Travis CI is docker compose
   to support external dependencies (mysql etc.) - this does not play well
   with Google Cloud Build with their docker-in-docker approach but if we run
   in Travis CI this should be perfectly fine to run the airflow-breeze image
   there through docker compose, or it might turn easier to install mysql
   within the image itself rather than docker compose - it will make it much
   easier to multiply docker instances and run them in paralel. In our
   environment we start Postgres DB in docker and run all system tests using
   local executor + Postgres and it's super easy to run tests on multiple
   environments this way even running them on the same machine (this will be
   more complex with docker compose)
   - Also Breeze is closely tied with Google Cloud for Cloud Build - but we
   can, fairly easily make it an optional component. We also have not focused
   on Kubernetes workers but as I understand we want to go to GKE - which
   would make it even better as we will need Google Cloud Platform integration
   baked in - and we already have it and we could use the same mechanisms. We
   can also leverage our contacts with Google team and maybe we can ask Google
   to donate some recurring credits to make a shared Google Cloud Platform
   project so that we can have a shared Airflow GCP project to integrate
   everything there.


*Some more information about Airflow Breeze's Cloud Build support and
System Tests. *

We have a design doc
<https://docs.google.com/document/d/15hdqL4bWU0646nAvxsEjIEr0gHOhMu6OByDWI1oiE7w/edit?usp=drive_web&ouid=112320280470690058978>
that
describes the whole environment. A number of things there are GCP related -
we have integration with Google Cloud services (Cloud Build, Functions,
PubSub, Repositories) to run our automated System Tests. One interesting
feature of Airflow Breeze's  is to be able to easily configure and run
System Tests with Google Cloud Platform (
https://github.com/PolideaInternal/airflow-breeze/blob/master/README.systemtests.md).
We also have really nice Slack notifications
<https://github.com/PolideaInternal/airflow-breeze/blob/master/images/slack_notification.png>
after build is complete + automated summary
<https://storage.googleapis.com/polidea-airflow-builds/6ed0e876-2fe3-41b4-90d0-4fa839901085/index.html>
showing result of automated system tests + automatically generated
documentation
<https://storage.googleapis.com/polidea-airflow-builds/6ed0e876-2fe3-41b4-90d0-4fa839901085/docs/index.html>
+ logs from system tests
<https://console.cloud.google.com/storage/browser/polidea-airflow-builds/6ed0e876-2fe3-41b4-90d0-4fa839901085/logs?project=polidea-airflow>.
We do not aim for it to replace Travis CI (which we also run) - it's
complementary to Travis and it runs only relevant GCP unit tests and System
Tests with the real GCP project of ours.

I initially described our intentions in AIP-4
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-4+Support+for+Integration+Tests>
(which
is also mentioned in AIP=7) - but I will soon change AIP-4 description to
match what we've actually develop for our own usage - which is GCP-specific
and not aimed to replace the Travis CI testing.

*Few more words about Google Cloud Platform integration of Airflow Breeze*

Currently it is implemented in this way that each team can have it's own
Google Project ID to work on (or even several projects because we support
multiple workspaces) and we have the way to easily bootstrap the project in
the GCP project from the scratch - that includes automated setup of all the
required permissions, service accounts, service APIs, creating and filling
test buckets, preparing Google Cloud Build triggers and so one - so
literally in 20 minutes you can have a new GCP project up and running -
ready to run your system tests.

I would be supper happy if we can contribute what we've done there.
Currently we have some very small commit that we cherry-pick in our
branches to be able to use Automated Cloud Build (namely cloudbuild.yaml
file - similar to .travis.yml) but if we can modify it and make it part of
the main Apache project - we would be more than happy to do it!

Let me know what you think !

J.


On Wed, Jan 2, 2019 at 10:57 PM Daniel Imberman <daniel.imberman@gmail.com>
wrote:

> Hi guys, I've set up a few sub-projects for this. @gerardo @fokko Lemme
> know what you guys think
>
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/Optimizing+Docker+Image+Workflow
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/Kubernetes+Testing%3A+Using+GKE+instead+of+Minikube
>
> On Tue, Jan 1, 2019 at 11:45 PM Driesprong, Fokko <fokko@driesprong.frl>
> wrote:
>
> > Hi Gerardo,
> >
> > Very valid points. I'm fully in favor of your proposal. To simplify the
> > stack, I strongly believe we should also strip out tox and fully rely on
> > Docker. Using tox will add another layer that doesn't add a lot of value
> > from my perspective. Also, we should bake all the *.sh bootstrap scripts
> > <https://github.com/apache/incubator-airflow/tree/master/scripts/ci> in
> > the
> > Docker container, instead of having to set this up before running the
> > tests.
> >
> > In the upcoming months, I might have a bit more time to spend on Airflow,
> > I'm happy to assist you on this one.
> >
> > Cheers, Fokko
> >
> > Op wo 2 jan. 2019 om 06:51 schreef Daniel Imberman <
> > daniel.imberman@gmail.com>:
> >
> > > @gerardo thank you for setting this up.
> > >
> > > I've also been extremely interested in this as well. I've been messing
> > with
> > > GCP VM instances in the past few weeks to try to simplify my local
> build
> > as
> > > well. Would definitely be interested in helping with the AIP +
> > > implementation.
> > >
> > > One thing I believe we should do is set up the ci base-image with all
> of
> > > the pip dependencies pre-loaded. A lot of time is wasted pip installing
> > > dependencies. We can auto-generate new images whenever a PR is
> submitted
> > to
> > > this repository and then specify the tag in the .travis.yml when
> > building.
> > >
> > > On the k8s side, I think we need to move away from minikube for k8s
> > > testing. I discussed in a previous email setting travis to work with
> GKE.
> > > I'd be careful about coupling k8s stuff too tightly with a docker
> > > infrastructure. That can get pretty dicey. I think as long as we're
> > using a
> > > separate k8s cluster the k8s executor tests only need to gather the IP
> > > addresses + have access to the kubeconfig.
> > >
> > >
> > > On Tue, Jan 1, 2019 at 8:10 PM Gerardo Curiel <gerardo@gerar.do>
> wrote:
> > >
> > > > Hi folks,
> > > >
> > > > I've created an AIP for simplifying Airflow's development workflow:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-7+Simplified+development+workflow
> > > >
> > > > The goal of this proposal is to outline the work needed to make local
> > > > testing significantly easier and standardise the best practices to
> > > > contribute to the Airflow project.
> > > >
> > > > Any input on it would be greatly appreciated.
> > > >
> > > > Cheers,
> > > >
> > > > --
> > > > Gerardo Curiel // https://gerar.do
> > >
> > >
> > > On Tue, Jan 1, 2019 at 8:10 PM Gerardo Curiel <gerardo@gerar.do>
> wrote:
> > >
> > > > Hi folks,
> > > >
> > > > I've created an AIP for simplifying Airflow's development workflow:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-7+Simplified+development+workflow
> > > >
> > > > The goal of this proposal is to outline the work needed to make local
> > > > testing significantly easier and standardise the best practices to
> > > > contribute to the Airflow project.
> > > >
> > > > Any input on it would be greatly appreciated.
> > > >
> > > > Cheers,
> > > >
> > > > --
> > > > Gerardo Curiel // https://gerar.do
> > > >
> > >
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
E: jarek.potiuk@polidea.com
[image: Polidea] <https://www.polidea.com/>

We create human & business stories through technology.
Check out our projects! <https://www.polidea.com/our-work>
[image: Github] <https://github.com/Polidea> [image: Facebook]
<https://www.facebook.com/Polidea.Software> [image: Twitter]
<https://twitter.com/polidea> [image: Linkedin]
<https://www.linkedin.com/company/polidea> [image: Instagram]
<https://instagram.com/polidea> [image: Behance]
<https://www.behance.net/polidea>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message