airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <>
Subject Re: Summary of committer meeting 2016-05-12
Date Fri, 13 May 2016 17:05:30 GMT

It was but it wasn't broadly communicated. We will repeat it, with an open invitation, every
week or two weeks. 

Now to figure out how to share a video link that works continuously without me or someone
else being there every time...


Sent from my iPhone

> On 13 mei 2016, at 18:55, Jakob Homan <> wrote:
> Cool.  Was this a public meeting?  Will the next one be?
>> On 13 May 2016 at 08:20, Chris Riccomini <> wrote:
>> Hey Bolke,
>> Thanks for writing this up. I don't have a ton of feedback, as I'm not
>> terribly familiar with the internals of the scheduler, but two notes:
>> 1. A major +1 for the celery/local executor discussion. IMO, Celery is a
>> net-negative on this project, and should be fully removed in favor of the
>> LocalExecutor. Splitting the scheduler from the executor in the
>> LocalExecutor would basically give parity with Celery, AFAICT, and sounds
>> much easier to operate to me.
>> 2. If we are moving towards Docker as a container for DAG execution in the
>> future, it's probably worth considering how these changes are going to
>> affect the Docker implementation. If we do pursue (1), how does this look
>> in a Dockerized world? Is the executor going to still exist? Would the
>> scheduler interact directly with Kubernetes/Mesos instead?
>> Cheers,
>> Chris
>>> On Fri, May 13, 2016 at 3:41 AM, Bolke de Bruin <> wrote:
>>> Hi,
>>> We did a video conference on the scheduler with a couple of the committers
>>> yesterday. The meeting was not there to finalize any roadmap but more to
>>> get a general understanding of each other's work. To keep it as transparent
>>> as possible hereby a summary:
>>> Who were attending:
>>> Max, Paul, Arthur, Dan, Sid, Bolke
>>> The discussion centered around the scheduler sometimes diving into
>>> connected topic such as pooling and executors. Paul discussed his work on
>>> making the scheduler more robust against faulty Dags and also to make the
>>> scheduler faster by not making it dependent on the slowest parsed Dag. PR
>>> work will be provided shortly to open it up to the community as the aim is
>>> to have this in by end of Q2 (no promises ;-)).
>>> Continuing the strain of thought of making the scheduler faster the
>>> separation of executor and scheduler was also discussed. It was remarked by
>>> Max that doing this separation would essentially create the equivalent of
>>> the celery workers. Sid mentioned that celery seemed to be a culprit of
>>> setup issues and people tend to use the local executor instead. The
>>> discussion was parked as it needs to be discussed with a wider audience
>>> (mailing list, community) and is not something that we thin is required in
>>> the near term (obviously PRs are welcome).
>>> Next, we discussed some of the scheduler issues that are marked in the
>>> attached document (
>>> <
>>>>). Core
>>> issues discussed were 1) TaskInstances can be created without a DagRun, 2)
>>> non-intuitive behavior with start_date and also depends_on_past and 3)
>>> Lineage. It was agreed that the proposal add a previous field to the DagRun
>>> model and to make backfills (a.o) use DagRun make sense. More discussion
>>> was around the lineage part as that involves more in depth changes to
>>> specifically TaskInstances. Still the consensus in the group was that it is
>>> necessary to make steps here and that they are long overdue.
>>> Lastly, we discussed to draft scheduler roadmap (see doc) to see if there
>>> were any misalignments. While there are some differences in details we
>>> think the steps are quite compatible and the differences can be worked out.
>>> So that was it, in case I missed anything correct me. In case of questions
>>> suggestions etc don’t hesitate and put them on the list.
>>> Cheers
>>> Bolke

View raw message