airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <bdbr...@gmail.com>
Subject Re: Merging the experimental API Framework
Date Tue, 29 Nov 2016 09:43:11 GMT
Flask App Builder looks great at a first glance and experience counts obviously, though I have
several concerns:

Authentication
- The Hadoop ecosystem, especially the on premise installs, is dependent for its security
integration on Kerberos. FAB out of the box does not support this. 
- In addition I would like to implement something along the lines of the Hadoop delegation
token. Ie. the client first authenticates with Kerberos and then gets a token for further
communication. This reduces stress on the KDC and allows us to let the tasks get their connection
details from the API instead of direct access to the database. Imagine a (celery) worker getting
a token on behalf of the task, the tasks uses this to communicate with the API to get the
connection details it needs. The token has a limited lifetime and automatically expires on
finishing a task. This requires a custom authentication layer in FAB.

Non JSON returns for service-to-service communication (or in general)
- The second point above is a non public API and will be used extensively, thus JSON as a
return type is not efficient from several perspectives: 1) message size 2) schema. Avro or
Protobuf seem to fit the bill much better. From a first look I couldn’t figure out if FAB
can do this easily.

Going a bit off-topic I imagine Airflow in separate packages which are loosely coupled (as
I guess you described in the Airflow 2.0 thread):

airflow-client
Brings you the CLI and client libraries that allow you to integrate Airflow with your own
programs. You can run the client from anywhere because it will look up its endpoint by DNS
(eg. _airflow._tcp.example.com. 86400 IN SRV 0 5 443 airflow.example.com).

airflow-api
API server for service to service and endpoints. Checks its registration in DNS.

airflow-scheduler
Scheduler only (no LocalExecutor Workers)

airflow-worker
Celery / Local / Mesos 

Absent: airflow webserver, serving files can be handled by apache/nginx.

So yes lets talk :-).

Bolke


> Op 29 nov. 2016, om 01:14 heeft Maxime Beauchemin <maximebeauchemin@gmail.com>
het volgende geschreven:
> 
> Glad to see this! On my side I've been playing around trying to use Flask
> App Builder (just obtained committer status on the project) which covers
> some out of the box CRUD REST API for models and a
> authentication/role/permission framework that would allow for a
> multi-tenant UI / API with arbitrary roles / granular access.
> 
> We should chat as to how these solutions might work together. My early work
> and some of the rational for it can be found here:
> https://github.com/mistercrunch/airflow_webserver
> 
> We should chat! Let's schedule time.
> 
> Max
> 
> On Mon, Nov 28, 2016 at 11:36 AM, Andrew Phillips <andrewp@apache.org>
> wrote:
> 
>> Just wanted to say this is very exciting, thank you Bolke :).
>>> 
>> 
>> Big +1 to that. Thanks, Bolke!
>> 
>> ap
>> 


Mime
View raw message