airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Meickle <jmeic...@quantopian.com.INVALID>
Subject Re: [2.0 spring cleaning] Deprecate adding Operators and Hooks via plugins?
Date Mon, 15 Apr 2019 13:23:40 GMT
The way I'm thinking about this issue is like this:

- What features can plugins currently provide
- What non-plugin features in Airflow are currently hard to extend in a way
that would benefit from pluggability
- Out of the union of that feature set: which of them can be easily done
via Python imports in DAGs (like operators); which can be done via
entrypoints; which can be done via classpaths (like the Logging Config
file); and which need more complicated machinery (a plugin definition) to
get registered?
- How can we make each of those paths useful/easy/debuggable for someone
who is new to Python? Anything based on Python imports is very complex to
understand particularly once venvs and other common multi-Python deploy
strategies get involved.

So to my mind, it's totally straightforward that we should not use plugins
for operators. It's unclear to me what to do with the rest of the
functionality that plugins provide, without a comparison of alternatives.

On Sun, Apr 14, 2019 at 3:28 PM Driesprong, Fokko <fokko@driesprong.frl>
wrote:

> I actually never used the plugin framework, and used standard Python
> imports. I'm all in for removing it since it is very confusing for
> newcomers.
>
> Cheers, Fokko
>
> Op zo 14 apr. 2019 om 10:26 schreef Ash Berlin-Taylor <ash@apache.org>:
>
> > There is something to be said for the ease of putting plugins in the
> > airflow home folder too that I would still like to keep, as it makes
> > deploying site specific plugins much easier.
> >
> > On 14 April 2019 04:13:52 BST, Jarek Potiuk <Jarek.Potiuk@polidea.com>
> > wrote:
> > >+1 for using more entrypoints as discovery mechanism. Maybe we should
> > >also
> > >start to use the 'console_scripts' entrypoint group for "airflow"
> > >binary
> > >instead of copying the binary airflow script (
> > >
> >
> https://python-packaging.readthedocs.io/en/latest/command-line-scripts.html#the-console-scripts-entry-point
> > >) as well as for plugginable extensions.
> > >
> > >J.
> > >
> > >On Sat, Apr 13, 2019 at 11:15 PM Maxime Beauchemin <
> > >maximebeauchemin@gmail.com> wrote:
> > >
> > >> I'd say it could be great to deprecate the whole plugin system and
> > >use
> > >> Python's "entry points" instead. I just didn't know that was an
> > >option and
> > >> the standard way to do this when I originally wrote it... The current
> > >> plugin system is a minefield for circular dependencies...
> > >>
> > >> On Sat, Apr 13, 2019 at 10:50 AM Jarek Potiuk
> > ><Jarek.Potiuk@polidea.com>
> > >> wrote:
> > >>
> > >> > One comment here - The current connections.py file screams to get
> > >dynamic
> > >> > discovery similar to what sqlalchemy does for dialects (for
> > >example).
> > >> >
> > >> > But on the other hand, I am not sure if this "registration" is
> > >really
> > >> > needed. Maybe I am totally wrong, but somehow I don't find it
> > >> particularly
> > >> > useful to get hook by connection_type.
> > >> >
> > >> > I believe connection_type is useful for the UI to be able to select
> > >the
> > >> > connection when you create it and have different UI fields defined
> > >by
> > >> > connection. But nearly universally in other places, when operator
> > >is
> > >> > instantiated, it also instantiates the hook it expects to use,
> > >rather
> > >> than
> > >> > rely on get_hook() or parse_from_uri() of the connection. Each
> > >operator
> > >> > knows what connection type it needs.
> > >> >
> > >> > I checked it quickly and it seems the "get by connection type" or
> > >> > parse_from_uri functionality is almost exclusively used by tests.
> > >> >
> > >> > Again - maybe I am completely wrong but maybe we should rethink the
> > >whole
> > >> > "connection" class and get rid of the need of "registering" new
> > >> connection
> > >> > types at all (or move the relevant stuff to be UI only).
> > >> >
> > >> >
> > >> >
> > >> > On Fri, Apr 12, 2019 at 6:07 PM Daniel Imberman <
> > >> daniel.imberman@gmail.com
> > >> > >
> > >> > wrote:
> > >> >
> > >> > > I think that's the important distinction. This wouldn't be
> > >removing
> > >> > custom
> > >> > > hooks and operators, this is just removing the python magic that
> > >has
> > >> you
> > >> > > placing them in a plugins folder. I can see the value in having
> > >the
> > >> DAGs
> > >> > > being real-time parsed as users might want to make small changes
> > >to
> > >> > those,
> > >> > > but operators/hooks should be fairly static once created.
> > >> > >
> > >> > > On Fri, Apr 12, 2019 at 8:53 AM Ash Berlin-Taylor
> > ><ash@apache.org>
> > >> > wrote:
> > >> > >
> > >> > > > Can you give an example?
> > >> > > >
> > >> > > > We can still have custom hooks/operators per install, I'm
just
> > >saying
> > >> > > they
> > >> > > > don't need to be implemented/"registered" with airflow.plugin
> > >-- all
> > >> > that
> > >> > > > provides is another way of having that be imported.
> > >> > > >
> > >> > > > -ash
> > >> > > >
> > >> > > > > On 12 Apr 2019, at 16:21, Chen Tong <cixuuz@gmail.com>
wrote:
> > >> > > > >
> > >> > > > > IMHO, it's worth to have such dynamic hooks adding
ability. I
> > >met
> > >> > > issues
> > >> > > > > before that the current hook cannot satisfy the needs.
I have
> > >to
> > >> > write
> > >> > > my
> > >> > > > > own hook and hacked Connections. If hook is not coupled
> > >tightly
> > >> with
> > >> > > > > Connections, it will be easier by switching to another
python
> > >> > package.
> > >> > > > >
> > >> > > > > On Fri, Apr 12, 2019 at 11:14 AM Daniel Imberman <
> > >> > > > daniel.imberman@gmail.com>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > >> I am all for this. The only complexity is ensuring
the
> > >> operator/hook
> > >> > > > >> exists in the workers too, but overall highly for.
> > >> > > > >> On Apr 12, 2019, 7:37 AM -0700, James Meickle <
> > >> > > jmeickle@quantopian.com
> > >> > > > .invalid>,
> > >> > > > >> wrote:
> > >> > > > >>> YES - I strongly agree with this! I first did
it this way
> > >> because I
> > >> > > > >> wanted
> > >> > > > >>> to follow the instructions, assuming there
was some Airflow
> > >> magic,
> > >> > > and
> > >> > > > >>> later found it really frustrating to maintain.
We should be
> > >clear
> > >> > > that
> > >> > > > >>> standard Python packaging is the way to go.
> > >> > > > >>>
> > >> > > > >>> That being said, what if Airflow had an official
> > >Cookiecutter
> > >> > > template
> > >> > > > >> for
> > >> > > > >>> Airflow operator packages, and suggested that
as a way to
> > >manage
> > >> > your
> > >> > > > >>> operators in the docs? That would probably
help people who
> > >aren't
> > >> > as
> > >> > > > >>> familiar with Python, and over time might increase
quality
> > >of
> > >> third
> > >> > > > party
> > >> > > > >>> operators (if it ships by default with linting/etc.)
> > >> > > > >>>
> > >> > > > >>> On Fri, Apr 12, 2019 at 3:54 AM Ash Berlin-Taylor
<
> > >> ash@apache.org>
> > >> > > > >> wrote:
> > >> > > > >>>
> > >> > > > >>>> This is something I've said a number of
times (but
> > >possibly
> > >> never
> > >> > on
> > >> > > > >> the
> > >> > > > >>>> list):
> > >> > > > >>>>
> > >> > > > >>>> I think we should deprecate adding Operators
and Hooks via
> > >the
> > >> > > Airflow
> > >> > > > >>>> plugin mechanism.
> > >> > > > >>>>
> > >> > > > >>>> I think plugins should be reserved for
any mechanism that
> > >a
> > >> > plain-ol
> > >> > > > >>>> python module import won't work for (which
is basically
> > >anything
> > >> > > that
> > >> > > > >> needs
> > >> > > > >>>> to tie deeply in to the Webserver or Scheduler
process).
> > >> > > > >>>>
> > >> > > > >>>> To that endI think we should deprecate
adding operators
> > >via
> > >> > plugins:
> > >> > > > >>>>
> > >> > > > >>>> from airflow.operators.my_plugin import
MyOperator
> > >> > > > >>>>
> > >> > > > >>>> can become
> > >> > > > >>>>
> > >> > > > >>>> from my_plugin import MyOperator
> > >> > > > >>>>
> > >> > > > >>>> with no impact on functionality.
> > >> > > > >>>>
> > >> > > > >>>> The same could be said for hooks, with
one possible
> > >feature
> > >> > > addition:
> > >> > > > >>>> Right now the list of connection types
in the Connections
> > >screen
> > >> > is
> > >> > > > >>>> hard-coded. Is it possible worth making
that dynamic
> > >somehow
> > >> based
> > >> > > on
> > >> > > > >> the
> > >> > > > >>>> Hook classes that are loaded? i.e. adding
the ability for
> > >hooks
> > >> > > added
> > >> > > > >> in
> > >> > > > >>>> plguins to add items to that list?
> > >> > > > >>>>
> > >> > > > >>>> -ash
> > >> > > > >>
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> >
> > >> > Jarek Potiuk
> > >> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >> >
> > >> > M: +48 660 796 129 <+48660796129>
> > >> > E: jarek.potiuk@polidea.com
> > >> >
> > >>
> > >
> > >
> > >--
> > >
> > >Jarek Potiuk
> > >Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >
> > >M: +48 660 796 129 <+48660796129>
> > >E: jarek.potiuk@polidea.com
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message