airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From siddharth anand <r39...@gmail.com>
Subject Re: Summary of committer meeting 2016-05-12
Date Sat, 14 May 2016 00:37:04 GMT
I'm not familiar enough with Celery -- refer to my comment about giving up
after a day of playing with it -- to discount it totally. I'd actually feel
better informed once I got it running and could publish a "take these
steps", which I'm surprised that no one has done.

I'm all for simple, though I'm not sure "distributed executor" necessarily
falls in that camp. I'm open to any idea and PR, however.


-s

On Fri, May 13, 2016 at 10:40 PM, Chris Riccomini <criccomini@apache.org>
wrote:

> Hey Sid,
>
> I question the need for both local and celery executors (leaving
> sequential out of this). I think all we need is a scheduler + distributed
> executor. If you run only one each, then you have the LocalExecutor. The
> main thing that I care about is that this one thing is easy out of the box,
> and well tested. Right now, Celery is neither of those things.
>
> If we just had:
>
> airflow webserver
> airflow scheduler
> airflow executor
>
> I'd be happy. If `airflow executor` could start as SQL alchemy/DB backed
> (just like LocalExecutor), but be upgraded (but not force you) to Redis or
> RabbitMQ or SQS or whatever, great.
>
> I just want it easy and tested/stable.
>
> Cheers,
> Chris
>
> On Fri, May 13, 2016 at 11:57 AM, Siddharth Anand <sanand@apache.org>
> wrote:
>
>> Bolke, Thanks for providing the document and for generally driving a path
>> forward.
>> Regarding Local vs. Celery I think the project benefits greatly from
>> having multiple executors. Widespread adoption of Airflow involves keeping
>> the barriers to adoption as low as possible. We ship Airflow with a SQLite
>> DB and SequentialExecutor so that someone can simply install it on his/her
>> laptop, run the examples, immediately get familiar with the awesome UI
>> features. Soon after, he/she will want to run it on a test machine and
>> share it with his/her colleagues/management. Since some UI features don't
>> work with SQLAlchemy/SQLite, if the engineer were to run the Sequential
>> Engineer, his/her colleagues would like shoot the project down. Hence, this
>> engineer (let's call him/her our champion) will need to install the
>> LocalExecutor and run it against a non SQLite DB. The champion may need to
>> spin up a single machine in the cloud or request a machine from his/her Ops
>> team for a POC. Once the champion demos Airflow and people love it, people
>> will start using it. The LocalExecutor is the easiest to use, setup and
>> justify in terms of machine spend and complexity for a budding project in a
>> company. It is also possible that scale never becomes and issue, then the
>> level of setup was justified by the benefit to the company. BTW, at Agari,
>> we didn't have great Terraform and Ansible coverage when I started using
>> Airflow - we do now. As a result, setting up new machines in the cloud was
>> a little painful.
>> Now, once the company becomes dependent on airflow and if scale becomes a
>> major issue, then it is wise to take the next architectural step, which in
>> the case of the CeleryExecutor, means installing Redis and a bunch of
>> Celery components. By providing the lowest barrier to entry for each level
>> of company/developer commitment, we are simply recognizing how companies
>> work.
>> My point in yesterday's conversation was that we need multiple executors.
>> For any scheduler/core/executor changes made to the project, we need to
>> make sure it is tested on multiple executors.
>> Also, another point I would like to raise. The documentation around
>> getting Airflow running with Celery is very poor. I'd really like to see a
>> tutorial with gotchas published. I tried setting it up for a day and then
>> dropped it, preferring to run 2 schedulers (with LocalExecutors) for both
>> increased scheduling redundancy and greater executor throughput. 10% of our
>> GitHub issues reflect this lack of documentation and insight. It's great
>> that Airbnb is using it, but there is no clear path for others to follow.
>> As a result, I suspect a small minority run with Celery. And then they
>> running into either pickling issues, celery queue management/insight
>> questions, dag sync problems (e.g. my start date is not honored), etc...
>> -s
>>
>>     On Friday, May 13, 2016 3:20 PM, Chris Riccomini <
>> criccomini@apache.org> wrote:
>>
>>
>>  Hey Bolke,
>>
>> Thanks for writing this up. I don't have a ton of feedback, as I'm not
>> terribly familiar with the internals of the scheduler, but two notes:
>>
>> 1. A major +1 for the celery/local executor discussion. IMO, Celery is a
>> net-negative on this project, and should be fully removed in favor of the
>> LocalExecutor. Splitting the scheduler from the executor in the
>> LocalExecutor would basically give parity with Celery, AFAICT, and sounds
>> much easier to operate to me.
>> 2. If we are moving towards Docker as a container for DAG execution in the
>> future, it's probably worth considering how these changes are going to
>> affect the Docker implementation. If we do pursue (1), how does this look
>> in a Dockerized world? Is the executor going to still exist? Would the
>> scheduler interact directly with Kubernetes/Mesos instead?
>>
>> Cheers,
>> Chris
>>
>> On Fri, May 13, 2016 at 3:41 AM, Bolke de Bruin <bdbruin@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > We did a video conference on the scheduler with a couple of the
>> committers
>> > yesterday. The meeting was not there to finalize any roadmap but more to
>> > get a general understanding of each other's work. To keep it as
>> transparent
>> > as possible hereby a summary:
>> >
>> > Who were attending:
>> > Max, Paul, Arthur, Dan, Sid, Bolke
>> >
>> > The discussion centered around the scheduler sometimes diving into
>> > connected topic such as pooling and executors. Paul discussed his work
>> on
>> > making the scheduler more robust against faulty Dags and also to make
>> the
>> > scheduler faster by not making it dependent on the slowest parsed Dag.
>> PR
>> > work will be provided shortly to open it up to the community as the aim
>> is
>> > to have this in by end of Q2 (no promises ;-)).
>> >
>> > Continuing the strain of thought of making the scheduler faster the
>> > separation of executor and scheduler was also discussed. It was
>> remarked by
>> > Max that doing this separation would essentially create the equivalent
>> of
>> > the celery workers. Sid mentioned that celery seemed to be a culprit of
>> > setup issues and people tend to use the local executor instead. The
>> > discussion was parked as it needs to be discussed with a wider audience
>> > (mailing list, community) and is not something that we thin is required
>> in
>> > the near term (obviously PRs are welcome).
>> >
>> > Next, we discussed some of the scheduler issues that are marked in the
>> > attached document (
>> > https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg <
>> > https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg>). Core
>> > issues discussed were 1) TaskInstances can be created without a DagRun,
>> 2)
>> > non-intuitive behavior with start_date and also depends_on_past and 3)
>> > Lineage. It was agreed that the proposal add a previous field to the
>> DagRun
>> > model and to make backfills (a.o) use DagRun make sense. More discussion
>> > was around the lineage part as that involves more in depth changes to
>> > specifically TaskInstances. Still the consensus in the group was that
>> it is
>> > necessary to make steps here and that they are long overdue.
>> >
>> > Lastly, we discussed to draft scheduler roadmap (see doc) to see if
>> there
>> > were any misalignments. While there are some differences in details we
>> > think the steps are quite compatible and the differences can be worked
>> out.
>> >
>> > So that was it, in case I missed anything correct me. In case of
>> questions
>> > suggestions etc don’t hesitate and put them on the list.
>> > Cheers
>> > Bolke
>> >
>>
>>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message