airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerard Toonstra <>
Subject ETL best practices for airflow
Date Sun, 16 Oct 2016 21:23:34 GMT
Hi all,

About a year ago, I contributed the HTTPOperator/Sensor and I've been
tracking airflow since. Right now it looks like we're going to adopt
airflow at the company I'm currently working at.

In preparation for that, I've done a bit of research work how airflow
pipelines should fit together, how important ETL principles are covered and
decided to write this up on a documentation site. The airflow documentation
site contains everything on how all airflow works and the constructs that
you have available to build pipelines, but it can still be a challenge for
newcomers to figure out how to put those constructs together to use it

The articles I found online don't go into a lot of detail either. Airflow
is built around an important philosophy towards ETL and there's a risk that
newcomers simply pick up a really great tool and start off in the wrong way
when using it.

This weekend, I set off to write some documentation to try to fill this
gap. It starts off with a generic understanding of important ETL principles
and I'm currently working on a practical step-by-step example that adheres
to these principles with DAG implementations in airflow; i.e. showing how
it can all fit together.

You can find the current version here:

Looking forward to your comments. If you have better ideas how I can make
this contribution, don't hesitate to contact me with your suggestions.

Best regards,


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message