incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Incubator Wiki] Update of "AirflowProposal" by ChrisRiccomini
Date Wed, 16 Mar 2016 23:53:20 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The "AirflowProposal" page has been changed by ChrisRiccomini:

New page:
== Abstract ==

Airflow is a workflow automation and scheduling system that can be used to author and manage
data pipelines.

== Proposal ==

Airflow provides a system for authoring and managing workflows a.k.a. data pipelines a.k.a.
DAGs (Directed Acyclic Graphs). The developer authors DAGs in Python using an Airflow-provided
framework. He/She then executes the DAG using Airflow’s scheduler or registers the DAG for
event-based execution. A web-based UI provides the developer with a range of options for managing
and viewing his/her data pipelines. 

Airflow was developed at Airbnb to enable easier authorship and management of DAGs than were
possible with existing solutions such as Oozie and Azkaban. For starters, both Oozie and Azkaban
rely on one or more XML or property files to be bundled together to define a workflow. This
separation of code and config can present a challenge to understanding the DAG - in Azkaban,
a DAG’s structure is reflected by its file system tree and one can find himself/herself
traversing the file system when inspecting or changing the structure of the DAG. Airflow workflows,
on the other hand, are simply and elegantly defined in Python code, often a single file. Airflow
merges the powerful Web-based management aspects of projects like Azkaban and Oozie with the
simplicity and elegance of defining workflows in Python. Airflow, less than a year old in
terms of its Open Source launch, is currently used in production environments in more than
30 companies and boasts an active contributor list of more than 100 developers, the vast majority
of which (>95%) are outside of Airbnb.

We would like to share it with the ASF and begin developing a community of developers and
users within Apache.

== Rationale ==

Many organizations (>30) already benefit from running Airflow to manage data pipelines.
Our 100+ contributors continue to provide integrations with 3rd party systems through the
implementation of new hooks and operators, both of which are used in defining the tasks that
compose workflows. 

== Current Status ==

=== Meritocracy ===

Our intent with this incubator proposal is to start building a diverse developer community
around Airflow following the Apache meritocracy model. Since Airflow was open-sourced in mid-2015,
we have had fast adoption and contributions by multiple organizations the world over. We plan
to continue to support new contributors and we will work to actively promote those who contribute
significantly to the project to committers.

=== Community ===

Airflow is currently being used in over 30 companies. We hope to extend our contributor base
significantly and invite all those who are interested in building large-scale distributed
systems to participate.

=== Core Developers ===

Airflow is currently being developed by four engineers: Maxime Beauchemin, Siddharth Anand,
Bolke de Bruin, and Chris Riccomini. Chris is a member of the Apache Samza PMC and a contributor
to various Apache projects, including Apache Kafka and Apache YARN. Maxime, Siddharth, and
Bolke have contributed to Airflow.

=== Alignment ===
The ASF is the natural choice to host the Airflow project as its goal of encouraging community-driven
open-source projects fits with our vision for Airflow. 

== Known Risks ==

=== Orphaned Products ===

The core developers plan to work part time on the project. There is very little risk of Airflow
being abandoned as all of our companies rely on it.

=== Inexperience with Open Source ===

All of the core developers have experience with open source development. Chris is a member
of the Apache Samza PMC and a contributor to various Apache projects, including Apache Kafka
and Apache YARN. Bolke is contributor on multiple open source projects and a few Apache projects
as well, including Apache Hive, Apache Hadoop, and Apache Ranger.

=== Homogeneous Developers ===

The current core developers are all from different companies. Our community of 100 contributors
hail from over 30 different companies from across the world. 

=== Reliance on Salaried Developers ===

Currently, the only developer paid to work on this project is Maxime. 

=== Relationships with Other Apache Products ===

Airflow is deeply integrated with Apache products. It currently provides hooks and operators
to enable workflows to leverage Apache Pig, Apache Hive, Apache Spark, Apache Sqoop, Apache
Hadoop, etc… We plan to add support for other Apache projects in the future.

=== An Excessive Fascination with the Apache Brand ===

While we respect the reputation of the Apache brand and have no doubts that it will attract
contributors and users, our interest is primarily to give Airflow a solid home as an open
source project following an established development model. We have also given reasons in the
Rationale and Alignment sections.

== Documentation ==

== Initial Source ==

== External Dependencies ==
The dependencies all have Apache compatible licenses.

== Cryptography ==

== Required Resources ==

=== Mailing Lists ===

airflow-private for private PMC discussions (with moderated subscriptions) 

=== Subversion Directory ===

Git is the preferred source control system: git://

=== Issue Tracking ===
JIRA Airflow (Airflow)

=== Other Resources ===

The existing code already has unit tests, so we would like a Travis instance to run them whenever
a new patch is submitted. This can be added after project creation.

== Initial Committers ==

 * Maxime Beauchemin
 * Siddharth Anand
 * Chris Riccomini
 * Bolke de Bruin
 * Arthur Wiedmer
 * Dan Davydov

== Affiliations ==

 * Maxime Beauchemin (Airbnb)
 * Siddharth Anand (Agari)
 * Chris Riccomini (WePay)
 * Bolke de Bruin (ING)
 * Arthur Wiedmer (Airbnb)
 * Dan Davydov (Airbnb)

== Sponsors ==

=== Champion ===

Chris Riccomini (WePay, Apache Samza PMC)

=== Nominated Mentors ===

 * Chris Nauroth (HortonWorks, Apache Hadoop Committer/PMC Member, Apache ZooKeeper Committer,
Apache Software Foundation Member)
 * Hitesh Shah (HortonWorks, Apache Hadoop Committer/PMC Member, Apache Ambari Committer/PMC
Member, Apache Tez Committer/PMC/VP Member, Apache Software Foundation Member)
 * Jakob Homan (OfferUp, Apache Hadoop Committer/PMC Member, Apache Kafka Committer/PMC Member,
Apache Samza Committer/PMC Member, Apache Giraph Committer/PMC Member,  Apache Software Foundation

=== Sponsoring Entity ===

We are requesting the Incubator to sponsor this project.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message