airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From siddharth anand <san...@apache.org>
Subject Re: Airflow 2.0
Date Sat, 19 Nov 2016 01:34:44 GMT
David
https://issues.apache.org/jira/browse/AIRFLOW-558 (i.e. http
s://github.com/apache/incubator-airflow/pull/1830 ) Is on my plate.. have
already gone through many rounds of reviews, testing, and fixes with the
submitter and does not need to wait till 2.0. We should be able to merge it
soon. BTW, you are encouraged to vote on these PRs so maintainers can
prioritize their time.

Max,

Thanks for kicking off this thread.

Regarding 2.0, we've associated feature deprecation and non-backward
compatible changes with 2.0. Some of this work might be pretty
earth-shaking to Airflow users. IMHO, changes that increase user pain at
upgrade time need to be carefully balanced against value.

Watching both Gitter and the email list, there are a variety of stumbling
points (for new users) that many of us who have been using the product for
1-2 years have forgotten. A fair number of people still mention that
getting Airflow up and running is no simple task - i.e. Alex mentioned this
in his talk at the last meet-up. The recent BlueYonder talk referenced
https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls

Though we may be numerically near 2.0 in terms of release numbers, I'd
prefer to prioritize a few things higher than releasing 2.0. We need to
build and exercise a few necessary muscles : timely PR processing & timely
Apache releases (i.e. quarterly). Beyond that, I'd like to prioritize the
"common pitfall" problems to ease on-boarding. Some of these don't need to
wait for a major release. The ones that do can be developed on a separate
2.0 branch and baked, reviewed, and voted on by the community before we
consider dropping it into master.

That way, we can keep master healthy to support the increasing rate of
community-submitted PRs that we are seeing and reduce the cycle time of
cutting stable releases, all while working on big-bang changes for 2.0
independently.

Just my $0.02
-s

On Fri, Nov 18, 2016 at 3:57 PM, Chris Riccomini <criccomini@apache.org>
wrote:

> > RIP out the charting application and the data profiler
>
> Yes please! +1
>
> On Fri, Nov 18, 2016 at 2:41 PM, Maxime Beauchemin
> <maximebeauchemin@gmail.com> wrote:
> > Another point that may be controversial for Airflow 2.0: RIP out the
> > charting application and the data profiler. Even though it's nice to have
> > it there, it's just out of scope and has major security
> issues/implications.
> >
> > I'm not sure how popular it actually is. We may need to run a survey at
> > some point around this kind of questions.
> >
> > Max
> >
> > On Fri, Nov 18, 2016 at 2:39 PM, Maxime Beauchemin <
> > maximebeauchemin@gmail.com> wrote:
> >
> >> Using FAB's Model, we get pretty much all of that (REST API, auth/perms,
> >> CRUD) for free:
> >> http://flask-appbuilder.readthedocs.io/en/latest/
> >> quickhowto.html?highlight=rest#exposed-methods
> >>
> >> I'm pretty intimate with FAB since I use it (and contributed to it) for
> >> Superset/Caravel.
> >>
> >> All that's needed is to derive FAB's model class instead of SqlAlchemy's
> >> model class (which FAB's model wraps and adds functionality to and is
> 100%
> >> compatible AFAICT).
> >>
> >> Max
> >>
> >> On Fri, Nov 18, 2016 at 2:07 PM, Chris Riccomini <criccomini@apache.org
> >
> >> wrote:
> >>
> >>> > It may be doable to run this as a different package
> >>> `airflow-webserver`, an
> >>> > alternate UI at first, and to eventually rip out the old UI off of
> the
> >>> main
> >>> > package.
> >>>
> >>> This is the same strategy that I was thinking of for AIRFLOW-85. You
> >>> can build the new UI in parallel, and then delete the old one later. I
> >>> really think that a REST interface should be a pre-req to any
> >>> large/new UI changes, though. Getting unified so that everything is
> >>> driven through REST will be a big win.
> >>>
> >>> On Fri, Nov 18, 2016 at 1:51 PM, Maxime Beauchemin
> >>> <maximebeauchemin@gmail.com> wrote:
> >>> > A multi-tenant UI with composable roles on top of granular
> permissions.
> >>> >
> >>> > Migrating from Flask-Admin to Flask App Builder would be an easy-ish
> win
> >>> > (since they're both Flask). FAB Provides a good authentication and
> >>> > permission model that ships out-of-the-box with a REST api. Suffice
> to
> >>> > define FAB models (derivative of SQLAlchemy's model) and you get a
> set
> >>> of
> >>> > perms for the model (can_show, can_list, can_add, can_change,
> >>> can_delete,
> >>> > ...) and a set of CRUD REST endpoints. It would also allow us to rip
> out
> >>> > the authentication backend code out of Airflow and rely on FAB for
> that.
> >>> > Also every single view gets permissions auto-created for it, and
> there
> >>> are
> >>> > easy way to define row-level type filters based on user permissions.
> >>> >
> >>> > It may be doable to run this as a different package
> >>> `airflow-webserver`, an
> >>> > alternate UI at first, and to eventually rip out the old UI off of
> the
> >>> main
> >>> > package.
> >>> >
> >>> > https://flask-appbuilder.readthedocs.io/en/latest/
> >>> >
> >>> > I'd love to carve some time and lead this.
> >>> >
> >>> > On Fri, Nov 18, 2016 at 1:32 PM, Chris Riccomini <
> criccomini@apache.org
> >>> >
> >>> > wrote:
> >>> >
> >>> >> Full-fledged REST API (that the UI also uses) would be great in
2.0.
> >>> >>
> >>> >> On Fri, Nov 18, 2016 at 6:26 AM, David Kegley <kegs@b23.io>
wrote:
> >>> >> > Hi All,
> >>> >> >
> >>> >> > We have been using Airflow heavily for the last couple months
and
> >>> it’s
> >>> >> been great so far. Here are a few things we’d like to see
> prioritized
> >>> in
> >>> >> 2.0.
> >>> >> >
> >>> >> > 1) Role based access to DAGs:
> >>> >> > We would like to see better role based access through the
UI.
> >>> There’s a
> >>> >> related ticket out there but it hasn’t seen any action in a few
> months
> >>> >> > https://issues.apache.org/jira/browse/AIRFLOW-85
> >>> >> >
> >>> >> > We use a templating system to create/deploy DAGs dynamically
> based on
> >>> >> some directory/file structure. This allows analysts to quickly
> deploy
> >>> and
> >>> >> schedule their ETL code without having to interact with the Airflow
> >>> >> installation directly. It would be great if those same analysts
> could
> >>> >> access to their own DAGs in the UI so that they can clear DAG runs,
> >>> mark
> >>> >> success, etc. while keeping them away from our core ETL and other
> >>> >> people's/organization's DAGs. Some of this can be accomplished
with
> >>> ‘filter
> >>> >> by owner’ but it doesn’t address the use case where a DAG can
be
> >>> maintained
> >>> >> by multiple users in the same organization when they have separate
> >>> Airflow
> >>> >> user accounts.
> >>> >> >
> >>> >> > 2) An option to turn off backfill:
> >>> >> > https://issues.apache.org/jira/browse/AIRFLOW-558
> >>> >> > For cases where a DAG does an insert overwrite on a table
every
> day.
> >>> >> This might be a realistic option for the current version but I
just
> >>> wanted
> >>> >> to call attention to this feature request.
> >>> >> >
> >>> >> > Best,
> >>> >> > David
> >>> >> >
> >>> >> > On Nov 17, 2016, at 6:19 PM, Maxime Beauchemin <
> >>> >> maximebeauchemin@gmail.com<mailto:maximebeauchemin@gmail.com>>
> wrote:
> >>> >> >
> >>> >> > *This is a brainstorm email thread about Airflow 2.0!*
> >>> >> >
> >>> >> > I wanted to share some ideas around what I would like to do
in
> >>> Airflow
> >>> >> 2.0
> >>> >> > and would love to hear what others are thinking. I'll compile
the
> >>> ideas
> >>> >> > that are shared in this thread in a Wiki once the conversation
> fades.
> >>> >> >
> >>> >> > -------------------------------------------
> >>> >> >
> >>> >> > First idea, to get the conversation started:
> >>> >> >
> >>> >> > *Breaking down the package*
> >>> >> > `pip install airflow-common airflow-scheduler airflow-webserver
> >>> >> > airflow-operators-googlecloud ...`
> >>> >> >
> >>> >> > It seems to me like we're getting to a point where having
> different
> >>> >> > repositories and different packages would make things much
easier
> in
> >>> all
> >>> >> > sorts of ways. For instance the web server is a lot less sensitive
> >>> than
> >>> >> the
> >>> >> > scheduler, and changes to operators should/could be deployed
at
> will,
> >>> >> > independently from the main package. People in their environment
> >>> could
> >>> >> > upgrade only certain packages when needed. Travis builds would
be
> >>> more
> >>> >> > targeted, and take less time, ...
> >>> >> >
> >>> >> > Also, the whole current "extra_requires" approach to optional
> >>> >> dependencies
> >>> >> > (in setup.py) is kind getting out-of-hand.
> >>> >> >
> >>> >> > Of course `pip install airflow` would bring in a collection
of
> >>> >> sub-packages
> >>> >> > similar in functionality to what it does now, perhaps without
so
> many
> >>> >> > operators you probably don't need in your environment.
> >>> >> >
> >>> >> > The release process is the main pain-point and the biggest
risk
> for
> >>> the
> >>> >> > project, and I feel like this a solid solution to address
it.
> >>> >> >
> >>> >> > Max
> >>> >> >
> >>> >>
> >>>
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message