airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Laura Lorenz <>
Subject Re: ETL best practices for airflow
Date Mon, 17 Oct 2016 21:19:33 GMT
Same! I actually recently gave a talk about how my company uses airflow at
PyData DC. The video isn't live yet, but the slides are here
In substance it's actually very similar to what you've written.

I have some airflow-specific ideas about ways to write custom sensors that
poll job apis (pretty common for us). We do dynamic generation of tasks
using external metadata by embedding an API call in the DAG definition
file, which I'm not sure is a best practice or not...

Anyways, if it makes sense to contribute these case studies for
consideration as a 'best practice', if this is the place or way to do it,
I'm game. I agree that the resources and thought leadership on ETL design
is fragmented, and think the Airflow community is fertile ground to provide
discussion about it.

On Sun, Oct 16, 2016 at 6:40 PM, Boris Tyukin <> wrote:

> I really look forward to it, Gerard! I've read what you you wrote so far
> and I really liked it - please keep up the great job!
> I am hoping to see some best practices for the design of incremental loads
> and using timestamps from source database systems (not being on UTC so
> still confused about it in Airflow). Also practical use of subdags and
> dynamic generation of tasks using some external metadata (maybe describe in
> details something similar that wepay did
> On Sun, Oct 16, 2016 at 5:23 PM, Gerard Toonstra <>
> wrote:
> > Hi all,
> >
> > About a year ago, I contributed the HTTPOperator/Sensor and I've been
> > tracking airflow since. Right now it looks like we're going to adopt
> > airflow at the company I'm currently working at.
> >
> > In preparation for that, I've done a bit of research work how airflow
> > pipelines should fit together, how important ETL principles are covered
> and
> > decided to write this up on a documentation site. The airflow
> documentation
> > site contains everything on how all airflow works and the constructs that
> > you have available to build pipelines, but it can still be a challenge
> for
> > newcomers to figure out how to put those constructs together to use it
> > effectively.
> >
> > The articles I found online don't go into a lot of detail either. Airflow
> > is built around an important philosophy towards ETL and there's a risk
> that
> > newcomers simply pick up a really great tool and start off in the wrong
> way
> > when using it.
> >
> >
> > This weekend, I set off to write some documentation to try to fill this
> > gap. It starts off with a generic understanding of important ETL
> principles
> > and I'm currently working on a practical step-by-step example that
> adheres
> > to these principles with DAG implementations in airflow; i.e. showing how
> > it can all fit together.
> >
> > You can find the current version here:
> >
> >
> >
> >
> > Looking forward to your comments. If you have better ideas how I can make
> > this contribution, don't hesitate to contact me with your suggestions.
> >
> > Best regards,
> >
> > Gerard
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message