airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxime Beauchemin <maximebeauche...@gmail.com>
Subject Re: Refactoring Connection
Date Tue, 10 Jan 2017 00:35:08 GMT
About *"you can't **easily add a connection type without modifying the
airflow code source*", can we start by listing out the places that need to
be touched up when adding a connection?

Here's what I could find:
https://github.com/apache/incubator-airflow/blob/master/
airflow/models.py#L516
https://github.com/apache/incubator-airflow/blob/master/
airflow/utils/db.py#L109
https://github.com/apache/incubator-airflow/blob/master/
airflow/models.py#L620

I didn't dig super deep but it seems like what we need is a configurable
connection configuration blob that informs the connection type name, the
verbose name and the related hook location.

Maybe a solution is to have a `CONNECTION_TYPES` object in settings.py that
could be altered in different environments. This CONNECTION_TYPE would
store the name, the verbose_name and the related hook's path as a string
`airflow.hooks.mysql_hook.MySqlHook` or something like that. It may not be
that far from what you were proposing, and whether it should be a class or
a data structure is arguable, but I'd like to keep configuration elements
easily serializable so I'd vote for a data structure using only primitives.

Max

On Mon, Jan 9, 2017 at 5:13 AM, Alex Van Boxel <alex@vanboxel.be> wrote:

> I was actually going to propose something different with entry-points, but
> your requirement beat me to it (but that's ok :-). Actually I think with
> this mechanism people would be able to extend Airflow connection mechanism
> (and later other stuff) by doing *pip install airflow-sexy-new-connection*
> (for example).
>
> On Mon, Jan 9, 2017 at 1:39 PM Gael Magnan <gaelmagnan@gmail.com> wrote:
>
> > Thank you for the read, I'm gonna look at it, it's probably gonna be
> better
> > that what I have.
> >
> > Point taken about the URI, I'll see if i can find something generic
> enough
> > to handle all those cases.
> >
> > Le lun. 9 janv. 2017 à 13:36, Alex Van Boxel <alex@vanboxel.be> a écrit
> :
> >
> > > Thanks a lot, yes it clarifies a lot and I do agree you really need to
> > hack
> > > inside Airflow to add a Connection type. While you're working at this
> > could
> > > you have a look at the standard python *entry-point mechanism* for
> > > registering Connection types/components.
> > >
> > > A good read on this:
> > >
> > >
> > http://docs.pylonsproject.org/projects/pylons-webframework/
> en/latest/advanced_pylons/entry_points_and_plugins.html
> > >
> > > My first though would be that just by adding an entry to the factory
> > method
> > > would be enough to register your Connection + ConnectionType and UI.
> > >
> > > Also note that not everything works with a URI. The Google Cloud
> > Connection
> > > doesn't have one, it uses a secret key file stored on disk, so don't
> > force
> > > every connection type to work with URI's.
> > >
> > >
> > >
> > > On Mon, Jan 9, 2017 at 1:15 PM Gael Magnan <gaelmagnan@gmail.com>
> wrote:
> > >
> > > > Yes sure,
> > > >
> > > > The question was the following:
> > > > "I was looking at the code of the connections, and I realized you
> can't
> > > > easily add a connection type without modifying the airflow code
> > source. I
> > > > wanted to create a mongodb connection type, but I think the best
> > approche
> > > > would be to refactor connections first. Thoughts anyone?"
> > > >
> > > > The answer of Bolke de Bruin was: "making it more generic would be
> > > > appreciated"
> > > >
> > > > So basically the way the code is set up actually every types of
> > > connection
> > > > existing is defined though a list in the Connection class. It
> > implements
> > > > exactly the same code for parsing uri to get connections info and
> > doesn't
> > > > allow for a simple way to get back the uri from the connection infos.
> > > >
> > > > I need to add a mongodb connection and a way to get it back as a uri,
> > so
> > > i
> > > > could use an other type of connection and play around with that or
> > juste
> > > > add one more hard coded connection type, but I though this might be
> > > > something that comes back regularly and having a simple way to plug
> in
> > > new
> > > > types of connection would make it easier for anyone to contribute a
> new
> > > > connection type.
> > > >
> > > > Hope this clarifies my proposal.
> > > >
> > > > Le lun. 9 janv. 2017 à 12:46, Alex Van Boxel <alex@vanboxel.be>
a
> > écrit
> > > :
> > > >
> > > > > Hey Gael,
> > > > >
> > > > > could you please recap the question here and provide some context.
> > Not
> > > > > everyone on the mailinglist is actively following Gitter, including
> > me.
> > > > > With some context it would be easier to give feedback. Thanks.
> > > > >
> > > > > On Mon, Jan 9, 2017 at 11:15 AM Gael Magnan <gaelmagnan@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > following my question on gitter the other day and the response
> from
> > > > Bolke
> > > > > > de Bruin, I've started working on refactoring the connections
in
> > > > airflow.
> > > > > >
> > > > > > Before submitting a PR I wanted to share my proposal with you
and
> > get
> > > > > > feedbacks.
> > > > > >
> > > > > > The idea is quite simple, I've divided the Connection class
in
> two,
> > > > > > Connection and ConnectionType, connection has the same interface
> it
> > > had
> > > > > > before plus a few methods, but the class keeps a reference to
a
> > > > > dictionary
> > > > > > of registered ConnectionType. It delegates the work of parsing
> from
> > > > URI,
> > > > > > formatting to URI (added) and getting the hook to the
> > ConnectionType
> > > > > > associated with the conn_type.
> > > > > >
> > > > > > I've thought of two ways of registering new ConnectionTypes,
the
> > > first
> > > > is
> > > > > > making the BaseConnectionType use a metaclass that registered
any
> > new
> > > > > > ConnectionType with Connection when the class is declared, it
> would
> > > > > require
> > > > > > the less work to extend the connection module, as just importing
> > the
> > > > file
> > > > > > with the connection would do the trick.
> > > > > > The second one is juste to have a function/classmethod that
you
> > call
> > > > > > manually to register your connection. It would be simpler to
> > > understand
> > > > > but
> > > > > > requires more work every time you create a new ConnectionType.
> > > > > >
> > > > > > Hope this proposal is clear enough, and I'm waiting for feebacks
> > and
> > > > > > possible improvements.
> > > > > >
> > > > > > Regards
> > > > > > Gael Magnan de Bornier
> > > > > >
> > > > > --
> > > > >   _/
> > > > > _/ Alex Van Boxel
> > > > >
> > > >
> > > --
> > >   _/
> > > _/ Alex Van Boxel
> > >
> >
> --
>   _/
> _/ Alex Van Boxel
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message