airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxime Beauchemin <maximebeauche...@gmail.com>
Subject Re: Git webhooks with Airflow
Date Thu, 08 Sep 2016 15:39:06 GMT
Hi Vijay,

Up until recently we had the assumption that people had already their own
way of syncing GH repos on their infrastructure. In our case at Airbnb it's
chef, and pretty much every company has their own way of doing this and is
a requirement for distributed Airflow.

A related item on our roadmap is to allow for adding version semantics (git
SHAs) in the communication layer so that workers would fetch shallow clones
of the DAG repository as of a specific version. We were debating on using
some form of serialization versus this approach and decided to fully
embrace configuration as code, and shy away from the serialization /
artifact management which brings in many challenges and limitations,
especially in Python.

As we roll this change out, Airflow won't rely on external services to sync
up repos, and we'll have a solid story around versioning. Of course that
implies that Git becomes a critical hotspot in the cluster. We're planning
to ship this feature as opt-in, at least until 2.0

To the community, we'll share a formal design doc in the near future, in
the meantime this thread can be a good place for discussing this solution
at a high level.

Thanks,

Max

On Wed, Sep 7, 2016 at 3:25 PM, Vijay Bhat <vijaysbhat@gmail.com> wrote:

> Hi all,
>
> First off, I want to thank the Airflow community for developing a fantastic
> data pipelining platform. I used Dataswarm extensively while I was at
> Facebook, and it's awesome to see most of the functionality available for
> the rest of the world to use in the form of Airflow.
>
> What I haven't found in the documentation is a prescribed way to connect
> the source control repo for the DAG code to the Airflow DAG folder to make
> sure the latest code changes are picked up by the scheduler. In the Airflow
> forums, I have seen people mention using cron / chef / puppet etc, but no
> git webhook (https://developer.github.com/v3/repos/hooks/) based methods.
>
> Using webhooks would prevent the need to poll the repo for changes. For
> example, Jenkins uses webhooks to auto-trigger builds -
> https://wiki.jenkins-ci.org/display/JENKINS/Github+Plugin#GithubPlugin-
> TriggerabuildwhenachangeispushedtoGitHub.
> Does Airflow have a way of configuring something similar?
>
> Thanks!
> Vijay
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message