airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxime Beauchemin <maximebeauche...@gmail.com>
Subject Re: Refactoring Connection
Date Tue, 10 Jan 2017 20:55:22 GMT
Note that the `Conection.extra` attribute was designed to store anything
that is specific to a connection type. Here's an example of how it's used
in the HiveHook:
https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/hive_hooks.py#L75

Also note that currently there's no enforcement of connection type that
match a hook, so you can use any connection with any hook. So you could set
a new connection for mongo with no type at all and still use your local
MongoHook without changing the Airflow source code.

The one part that is missing is the association from the connection to the
hook here in Connection.get_hook
https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L620
This allows getting the hook from the connection, and allows for the Data
Profiling section of the UI to know how to execute a query against that
connection type.

Max

On Tue, Jan 10, 2017 at 2:58 AM, Gael Magnan <gaelmagnan@gmail.com> wrote:

> @George Leslie-Waksman:
>   It could be done, but since it's marked as deprecated everywhere and Alex
> Van Boxel asked me tu used entry points I'm not sure we should do both.
>
> @Jeremiah Lowin:
>   I was thinking not changing much of the api, juste the implementation
> behind it.
>   So for example, adding the mongodb connection type would just be doing
> the following (assuming we have a MongodbHook):
>   ```
>   from airflow.models import BaseConnectionType
>
>   class MongodbConnectionType(BaseConnectionType):
>       name = 'mongodb'
>       description = 'MongoDB'
>
>       def get_hook(conn_id):
>           try:
>               from airflow.hooks.mongodb_hook import MongodbHook
>               return MongodbHook(conn_id=conn_id)
>           except:
>               pass
>   ```
>
>   and if we want to handle the query params (and we do) we could just do
> something like overriding the parse_from_uri method (not final code):
>   ```
>       @classmethod
>       def parse_from_uri(cls, uri):
>           temp_uri = urlparse(uri)
>           (host, schema, login, password, port,
>             extras) = super().parse_from_uri(cls, uri)
>           extras = dict([param.split('=', 1) for param in
> temp_uri.query.split('&')])
>           return host, schema, login, password, port, extras
>   ```
>
>   So if you defined a connection using an uri the way you are able now, and
> this URI started with mongodb,
>   it would be extracted and a connection created.
>
>
> @Maxime Beauchemin:
>   By can't easily I meant that to had a connection type you have to go
> through making a pr to airflow, you can't just use a plugin.
>   I think all you have to do to add a new one is modify the Connection
> class in models. But it's still hard coded.
>
>   I agree with you, we could easily use just a data structure for most of
> the connections, especially the ones existing now,
>   but it would make it harder to add a connection whose uri format is not
> consistant with the other ones (not to mention ones that don't use uri),
>   except if we also handle a string format that would parse the uri and set
> the right fields.
>
> Le mar. 10 janv. 2017 à 01:35, Maxime Beauchemin <
> maximebeauchemin@gmail.com>
> a écrit :
>
> > About *"you can't **easily add a connection type without modifying the
> > airflow code source*", can we start by listing out the places that need
> to
> > be touched up when adding a connection?
> >
> > Here's what I could find:
> > https://github.com/apache/incubator-airflow/blob/master/
> > airflow/models.py#L516
> > https://github.com/apache/incubator-airflow/blob/master/
> > airflow/utils/db.py#L109
> > https://github.com/apache/incubator-airflow/blob/master/
> > airflow/models.py#L620
> >
> > I didn't dig super deep but it seems like what we need is a configurable
> > connection configuration blob that informs the connection type name, the
> > verbose name and the related hook location.
> >
> > Maybe a solution is to have a `CONNECTION_TYPES` object in settings.py
> that
> > could be altered in different environments. This CONNECTION_TYPE would
> > store the name, the verbose_name and the related hook's path as a string
> > `airflow.hooks.mysql_hook.MySqlHook` or something like that. It may not
> be
> > that far from what you were proposing, and whether it should be a class
> or
> > a data structure is arguable, but I'd like to keep configuration elements
> > easily serializable so I'd vote for a data structure using only
> primitives.
> >
> > Max
> >
> > On Mon, Jan 9, 2017 at 5:13 AM, Alex Van Boxel <alex@vanboxel.be> wrote:
> >
> > > I was actually going to propose something different with entry-points,
> > but
> > > your requirement beat me to it (but that's ok :-). Actually I think
> with
> > > this mechanism people would be able to extend Airflow connection
> > mechanism
> > > (and later other stuff) by doing *pip install
> > airflow-sexy-new-connection*
> > > (for example).
> > >
> > > On Mon, Jan 9, 2017 at 1:39 PM Gael Magnan <gaelmagnan@gmail.com>
> wrote:
> > >
> > > > Thank you for the read, I'm gonna look at it, it's probably gonna be
> > > better
> > > > that what I have.
> > > >
> > > > Point taken about the URI, I'll see if i can find something generic
> > > enough
> > > > to handle all those cases.
> > > >
> > > > Le lun. 9 janv. 2017 à 13:36, Alex Van Boxel <alex@vanboxel.be>
a
> > écrit
> > > :
> > > >
> > > > > Thanks a lot, yes it clarifies a lot and I do agree you really need
> > to
> > > > hack
> > > > > inside Airflow to add a Connection type. While you're working at
> this
> > > > could
> > > > > you have a look at the standard python *entry-point mechanism* for
> > > > > registering Connection types/components.
> > > > >
> > > > > A good read on this:
> > > > >
> > > > >
> > > > http://docs.pylonsproject.org/projects/pylons-webframework/
> > > en/latest/advanced_pylons/entry_points_and_plugins.html
> > > > >
> > > > > My first though would be that just by adding an entry to the
> factory
> > > > method
> > > > > would be enough to register your Connection + ConnectionType and
> UI.
> > > > >
> > > > > Also note that not everything works with a URI. The Google Cloud
> > > > Connection
> > > > > doesn't have one, it uses a secret key file stored on disk, so
> don't
> > > > force
> > > > > every connection type to work with URI's.
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Jan 9, 2017 at 1:15 PM Gael Magnan <gaelmagnan@gmail.com>
> > > wrote:
> > > > >
> > > > > > Yes sure,
> > > > > >
> > > > > > The question was the following:
> > > > > > "I was looking at the code of the connections, and I realized
you
> > > can't
> > > > > > easily add a connection type without modifying the airflow code
> > > > source. I
> > > > > > wanted to create a mongodb connection type, but I think the
best
> > > > approche
> > > > > > would be to refactor connections first. Thoughts anyone?"
> > > > > >
> > > > > > The answer of Bolke de Bruin was: "making it more generic would
> be
> > > > > > appreciated"
> > > > > >
> > > > > > So basically the way the code is set up actually every types
of
> > > > > connection
> > > > > > existing is defined though a list in the Connection class. It
> > > > implements
> > > > > > exactly the same code for parsing uri to get connections info
and
> > > > doesn't
> > > > > > allow for a simple way to get back the uri from the connection
> > infos.
> > > > > >
> > > > > > I need to add a mongodb connection and a way to get it back
as a
> > uri,
> > > > so
> > > > > i
> > > > > > could use an other type of connection and play around with that
> or
> > > > juste
> > > > > > add one more hard coded connection type, but I though this might
> be
> > > > > > something that comes back regularly and having a simple way
to
> plug
> > > in
> > > > > new
> > > > > > types of connection would make it easier for anyone to
> contribute a
> > > new
> > > > > > connection type.
> > > > > >
> > > > > > Hope this clarifies my proposal.
> > > > > >
> > > > > > Le lun. 9 janv. 2017 à 12:46, Alex Van Boxel <alex@vanboxel.be>
> a
> > > > écrit
> > > > > :
> > > > > >
> > > > > > > Hey Gael,
> > > > > > >
> > > > > > > could you please recap the question here and provide some
> > context.
> > > > Not
> > > > > > > everyone on the mailinglist is actively following Gitter,
> > including
> > > > me.
> > > > > > > With some context it would be easier to give feedback.
Thanks.
> > > > > > >
> > > > > > > On Mon, Jan 9, 2017 at 11:15 AM Gael Magnan <
> > gaelmagnan@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > following my question on gitter the other day and
the
> response
> > > from
> > > > > > Bolke
> > > > > > > > de Bruin, I've started working on refactoring the
connections
> > in
> > > > > > airflow.
> > > > > > > >
> > > > > > > > Before submitting a PR I wanted to share my proposal
with you
> > and
> > > > get
> > > > > > > > feedbacks.
> > > > > > > >
> > > > > > > > The idea is quite simple, I've divided the Connection
class
> in
> > > two,
> > > > > > > > Connection and ConnectionType, connection has the
same
> > interface
> > > it
> > > > > had
> > > > > > > > before plus a few methods, but the class keeps a reference
> to a
> > > > > > > dictionary
> > > > > > > > of registered ConnectionType. It delegates the work
of
> parsing
> > > from
> > > > > > URI,
> > > > > > > > formatting to URI (added) and getting the hook to
the
> > > > ConnectionType
> > > > > > > > associated with the conn_type.
> > > > > > > >
> > > > > > > > I've thought of two ways of registering new ConnectionTypes,
> > the
> > > > > first
> > > > > > is
> > > > > > > > making the BaseConnectionType use a metaclass that
registered
> > any
> > > > new
> > > > > > > > ConnectionType with Connection when the class is declared,
it
> > > would
> > > > > > > require
> > > > > > > > the less work to extend the connection module, as
just
> > importing
> > > > the
> > > > > > file
> > > > > > > > with the connection would do the trick.
> > > > > > > > The second one is juste to have a function/classmethod
that
> you
> > > > call
> > > > > > > > manually to register your connection. It would be
simpler to
> > > > > understand
> > > > > > > but
> > > > > > > > requires more work every time you create a new
> ConnectionType.
> > > > > > > >
> > > > > > > > Hope this proposal is clear enough, and I'm waiting
for
> > feebacks
> > > > and
> > > > > > > > possible improvements.
> > > > > > > >
> > > > > > > > Regards
> > > > > > > > Gael Magnan de Bornier
> > > > > > > >
> > > > > > > --
> > > > > > >   _/
> > > > > > > _/ Alex Van Boxel
> > > > > > >
> > > > > >
> > > > > --
> > > > >   _/
> > > > > _/ Alex Van Boxel
> > > > >
> > > >
> > > --
> > >   _/
> > > _/ Alex Van Boxel
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message