airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryabchuk, Pavlo" <ext-pavlo.ryabc...@here.com>
Subject RE: Airflow 2.0
Date Mon, 21 Nov 2016 12:59:51 GMT
-1. We extremely rely on data profiling, as a pipeline health monitoring tool 

-----Original Message-----
From: Chris Riccomini [mailto:criccomini@apache.org] 
Sent: Saturday, November 19, 2016 1:57 AM
To: dev@airflow.incubator.apache.org
Subject: Re: Airflow 2.0

> RIP out the charting application and the data profiler

Yes please! +1

On Fri, Nov 18, 2016 at 2:41 PM, Maxime Beauchemin <maximebeauchemin@gmail.com> wrote:
> Another point that may be controversial for Airflow 2.0: RIP out the 
> charting application and the data profiler. Even though it's nice to 
> have it there, it's just out of scope and has major security issues/implications.
>
> I'm not sure how popular it actually is. We may need to run a survey 
> at some point around this kind of questions.
>
> Max
>
> On Fri, Nov 18, 2016 at 2:39 PM, Maxime Beauchemin < 
> maximebeauchemin@gmail.com> wrote:
>
>> Using FAB's Model, we get pretty much all of that (REST API, 
>> auth/perms,
>> CRUD) for free:
>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffla
>> sk-appbuilder.readthedocs.io%2Fen%2Flatest%2F&data=01%7C01%7C%7C0064f
>> 74fd0d940ab732808d4100e9c3f%7C6d4034cd72254f72b85391feaea64919%7C1&sd
>> ata=uIJcFlm02IJ0Yo2cYLxAJZlkbCF2ZMk6dR%2FkhazZwVE%3D&reserved=0
>> quickhowto.html?highlight=rest#exposed-methods
>>
>> I'm pretty intimate with FAB since I use it (and contributed to it) 
>> for Superset/Caravel.
>>
>> All that's needed is to derive FAB's model class instead of 
>> SqlAlchemy's model class (which FAB's model wraps and adds 
>> functionality to and is 100% compatible AFAICT).
>>
>> Max
>>
>> On Fri, Nov 18, 2016 at 2:07 PM, Chris Riccomini 
>> <criccomini@apache.org>
>> wrote:
>>
>>> > It may be doable to run this as a different package
>>> `airflow-webserver`, an
>>> > alternate UI at first, and to eventually rip out the old UI off of 
>>> > the
>>> main
>>> > package.
>>>
>>> This is the same strategy that I was thinking of for AIRFLOW-85. You 
>>> can build the new UI in parallel, and then delete the old one later. 
>>> I really think that a REST interface should be a pre-req to any 
>>> large/new UI changes, though. Getting unified so that everything is 
>>> driven through REST will be a big win.
>>>
>>> On Fri, Nov 18, 2016 at 1:51 PM, Maxime Beauchemin 
>>> <maximebeauchemin@gmail.com> wrote:
>>> > A multi-tenant UI with composable roles on top of granular permissions.
>>> >
>>> > Migrating from Flask-Admin to Flask App Builder would be an 
>>> > easy-ish win (since they're both Flask). FAB Provides a good 
>>> > authentication and permission model that ships out-of-the-box with 
>>> > a REST api. Suffice to define FAB models (derivative of 
>>> > SQLAlchemy's model) and you get a set
>>> of
>>> > perms for the model (can_show, can_list, can_add, can_change,
>>> can_delete,
>>> > ...) and a set of CRUD REST endpoints. It would also allow us to 
>>> > rip out the authentication backend code out of Airflow and rely on FAB for
that.
>>> > Also every single view gets permissions auto-created for it, and 
>>> > there
>>> are
>>> > easy way to define row-level type filters based on user permissions.
>>> >
>>> > It may be doable to run this as a different package
>>> `airflow-webserver`, an
>>> > alternate UI at first, and to eventually rip out the old UI off of 
>>> > the
>>> main
>>> > package.
>>> >
>>> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2
>>> > Fflask-appbuilder.readthedocs.io%2Fen%2Flatest%2F&data=01%7C01%7C%
>>> > 7C0064f74fd0d940ab732808d4100e9c3f%7C6d4034cd72254f72b85391feaea64
>>> > 919%7C1&sdata=8mUPRcf4%2FQUDSbju%2BjLLImalhZeU7tOA%2BFpeO%2BjcEs8%
>>> > 3D&reserved=0
>>> >
>>> > I'd love to carve some time and lead this.
>>> >
>>> > On Fri, Nov 18, 2016 at 1:32 PM, Chris Riccomini 
>>> > <criccomini@apache.org
>>> >
>>> > wrote:
>>> >
>>> >> Full-fledged REST API (that the UI also uses) would be great in 2.0.
>>> >>
>>> >> On Fri, Nov 18, 2016 at 6:26 AM, David Kegley <kegs@b23.io> wrote:
>>> >> > Hi All,
>>> >> >
>>> >> > We have been using Airflow heavily for the last couple months 
>>> >> > and
>>> it’s
>>> >> been great so far. Here are a few things we’d like to see 
>>> >> prioritized
>>> in
>>> >> 2.0.
>>> >> >
>>> >> > 1) Role based access to DAGs:
>>> >> > We would like to see better role based access through the UI.
>>> There’s a
>>> >> related ticket out there but it hasn’t seen any action in a few 
>>> >> months
>>> >> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2
>>> >> > F%2Fissues.apache.org%2Fjira%2Fbrowse%2FAIRFLOW-85&data=01%7C01
>>> >> > %7C%7C0064f74fd0d940ab732808d4100e9c3f%7C6d4034cd72254f72b85391
>>> >> > feaea64919%7C1&sdata=VsgwHZxr0%2FDQN1jeBTJsfyIGu%2FZkkWhzAvxNvB
>>> >> > N531k%3D&reserved=0
>>> >> >
>>> >> > We use a templating system to create/deploy DAGs dynamically 
>>> >> > based on
>>> >> some directory/file structure. This allows analysts to quickly 
>>> >> deploy
>>> and
>>> >> schedule their ETL code without having to interact with the 
>>> >> Airflow installation directly. It would be great if those same 
>>> >> analysts could access to their own DAGs in the UI so that they 
>>> >> can clear DAG runs,
>>> mark
>>> >> success, etc. while keeping them away from our core ETL and other 
>>> >> people's/organization's DAGs. Some of this can be accomplished 
>>> >> with
>>> ‘filter
>>> >> by owner’ but it doesn’t address the use case where a DAG can be
>>> maintained
>>> >> by multiple users in the same organization when they have 
>>> >> separate
>>> Airflow
>>> >> user accounts.
>>> >> >
>>> >> > 2) An option to turn off backfill:
>>> >> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2
>>> >> > F%2Fissues.apache.org%2Fjira%2Fbrowse%2FAIRFLOW-558&data=01%7C0
>>> >> > 1%7C%7C0064f74fd0d940ab732808d4100e9c3f%7C6d4034cd72254f72b8539
>>> >> > 1feaea64919%7C1&sdata=Xkz7dTkFMEa4np19m4ML1VajVqVPNy%2BVSS5Y%2B
>>> >> > Sm8Odk%3D&reserved=0 For cases where a DAG does an insert 
>>> >> > overwrite on a table every day.
>>> >> This might be a realistic option for the current version but I 
>>> >> just
>>> wanted
>>> >> to call attention to this feature request.
>>> >> >
>>> >> > Best,
>>> >> > David
>>> >> >
>>> >> > On Nov 17, 2016, at 6:19 PM, Maxime Beauchemin <
>>> >> maximebeauchemin@gmail.com<mailto:maximebeauchemin@gmail.com>>
wrote:
>>> >> >
>>> >> > *This is a brainstorm email thread about Airflow 2.0!*
>>> >> >
>>> >> > I wanted to share some ideas around what I would like to do in
>>> Airflow
>>> >> 2.0
>>> >> > and would love to hear what others are thinking. I'll compile 
>>> >> > the
>>> ideas
>>> >> > that are shared in this thread in a Wiki once the conversation
fades.
>>> >> >
>>> >> > -------------------------------------------
>>> >> >
>>> >> > First idea, to get the conversation started:
>>> >> >
>>> >> > *Breaking down the package*
>>> >> > `pip install airflow-common airflow-scheduler airflow-webserver

>>> >> > airflow-operators-googlecloud ...`
>>> >> >
>>> >> > It seems to me like we're getting to a point where having 
>>> >> > different repositories and different packages would make things

>>> >> > much easier in
>>> all
>>> >> > sorts of ways. For instance the web server is a lot less 
>>> >> > sensitive
>>> than
>>> >> the
>>> >> > scheduler, and changes to operators should/could be deployed at

>>> >> > will, independently from the main package. People in their 
>>> >> > environment
>>> could
>>> >> > upgrade only certain packages when needed. Travis builds would

>>> >> > be
>>> more
>>> >> > targeted, and take less time, ...
>>> >> >
>>> >> > Also, the whole current "extra_requires" approach to optional
>>> >> dependencies
>>> >> > (in setup.py) is kind getting out-of-hand.
>>> >> >
>>> >> > Of course `pip install airflow` would bring in a collection of
>>> >> sub-packages
>>> >> > similar in functionality to what it does now, perhaps without 
>>> >> > so many operators you probably don't need in your environment.
>>> >> >
>>> >> > The release process is the main pain-point and the biggest risk

>>> >> > for
>>> the
>>> >> > project, and I feel like this a solid solution to address it.
>>> >> >
>>> >> > Max
>>> >> >
>>> >>
>>>
>>
>>
Mime
View raw message