airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerard Toonstra <gtoons...@gmail.com>
Subject Data Vault on Hive + AIrflow example
Date Thu, 01 Mar 2018 07:26:13 GMT
Yesterday I finished the draft of a new example on the "ETL with airflow"
site. This example explores the concept of a "Data vault"  methodology on
top of Hive, 100% orchestrated by airflow:

https://gtoonstra.github.io/etl-with-airflow/datavault2.html

The theory of the data vault is that you can change the business rules of
how data gets transformed, applied and calculated over time, which can be
helpful, because you don't need prior agreements up-front when designing a
DWH and have more flexibility to work out what's needed over time (i.e...
you don't get pinned by design choices made months or years earlier). This
means it reduces the need for consensus and meetings before you even get
started with coding.

As always, looking for input and suggestions on the example and code
provided.

Best regards,

Gerard

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message