airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gael Magnan <gaelmag...@gmail.com>
Subject Re: Refactoring Connection
Date Tue, 10 Jan 2017 10:58:41 GMT
@George Leslie-Waksman:
  It could be done, but since it's marked as deprecated everywhere and Alex
Van Boxel asked me tu used entry points I'm not sure we should do both.

@Jeremiah Lowin:
  I was thinking not changing much of the api, juste the implementation
behind it.
  So for example, adding the mongodb connection type would just be doing
the following (assuming we have a MongodbHook):
  ```
  from airflow.models import BaseConnectionType

  class MongodbConnectionType(BaseConnectionType):
      name = 'mongodb'
      description = 'MongoDB'

      def get_hook(conn_id):
          try:
              from airflow.hooks.mongodb_hook import MongodbHook
              return MongodbHook(conn_id=conn_id)
          except:
              pass
  ```

  and if we want to handle the query params (and we do) we could just do
something like overriding the parse_from_uri method (not final code):
  ```
      @classmethod
      def parse_from_uri(cls, uri):
          temp_uri = urlparse(uri)
          (host, schema, login, password, port,
            extras) = super().parse_from_uri(cls, uri)
          extras = dict([param.split('=', 1) for param in
temp_uri.query.split('&')])
          return host, schema, login, password, port, extras
  ```

  So if you defined a connection using an uri the way you are able now, and
this URI started with mongodb,
  it would be extracted and a connection created.


@Maxime Beauchemin:
  By can't easily I meant that to had a connection type you have to go
through making a pr to airflow, you can't just use a plugin.
  I think all you have to do to add a new one is modify the Connection
class in models. But it's still hard coded.

  I agree with you, we could easily use just a data structure for most of
the connections, especially the ones existing now,
  but it would make it harder to add a connection whose uri format is not
consistant with the other ones (not to mention ones that don't use uri),
  except if we also handle a string format that would parse the uri and set
the right fields.

Le mar. 10 janv. 2017 à 01:35, Maxime Beauchemin <maximebeauchemin@gmail.com>
a écrit :

> About *"you can't **easily add a connection type without modifying the
> airflow code source*", can we start by listing out the places that need to
> be touched up when adding a connection?
>
> Here's what I could find:
> https://github.com/apache/incubator-airflow/blob/master/
> airflow/models.py#L516
> https://github.com/apache/incubator-airflow/blob/master/
> airflow/utils/db.py#L109
> https://github.com/apache/incubator-airflow/blob/master/
> airflow/models.py#L620
>
> I didn't dig super deep but it seems like what we need is a configurable
> connection configuration blob that informs the connection type name, the
> verbose name and the related hook location.
>
> Maybe a solution is to have a `CONNECTION_TYPES` object in settings.py that
> could be altered in different environments. This CONNECTION_TYPE would
> store the name, the verbose_name and the related hook's path as a string
> `airflow.hooks.mysql_hook.MySqlHook` or something like that. It may not be
> that far from what you were proposing, and whether it should be a class or
> a data structure is arguable, but I'd like to keep configuration elements
> easily serializable so I'd vote for a data structure using only primitives.
>
> Max
>
> On Mon, Jan 9, 2017 at 5:13 AM, Alex Van Boxel <alex@vanboxel.be> wrote:
>
> > I was actually going to propose something different with entry-points,
> but
> > your requirement beat me to it (but that's ok :-). Actually I think with
> > this mechanism people would be able to extend Airflow connection
> mechanism
> > (and later other stuff) by doing *pip install
> airflow-sexy-new-connection*
> > (for example).
> >
> > On Mon, Jan 9, 2017 at 1:39 PM Gael Magnan <gaelmagnan@gmail.com> wrote:
> >
> > > Thank you for the read, I'm gonna look at it, it's probably gonna be
> > better
> > > that what I have.
> > >
> > > Point taken about the URI, I'll see if i can find something generic
> > enough
> > > to handle all those cases.
> > >
> > > Le lun. 9 janv. 2017 à 13:36, Alex Van Boxel <alex@vanboxel.be> a
> écrit
> > :
> > >
> > > > Thanks a lot, yes it clarifies a lot and I do agree you really need
> to
> > > hack
> > > > inside Airflow to add a Connection type. While you're working at this
> > > could
> > > > you have a look at the standard python *entry-point mechanism* for
> > > > registering Connection types/components.
> > > >
> > > > A good read on this:
> > > >
> > > >
> > > http://docs.pylonsproject.org/projects/pylons-webframework/
> > en/latest/advanced_pylons/entry_points_and_plugins.html
> > > >
> > > > My first though would be that just by adding an entry to the factory
> > > method
> > > > would be enough to register your Connection + ConnectionType and UI.
> > > >
> > > > Also note that not everything works with a URI. The Google Cloud
> > > Connection
> > > > doesn't have one, it uses a secret key file stored on disk, so don't
> > > force
> > > > every connection type to work with URI's.
> > > >
> > > >
> > > >
> > > > On Mon, Jan 9, 2017 at 1:15 PM Gael Magnan <gaelmagnan@gmail.com>
> > wrote:
> > > >
> > > > > Yes sure,
> > > > >
> > > > > The question was the following:
> > > > > "I was looking at the code of the connections, and I realized you
> > can't
> > > > > easily add a connection type without modifying the airflow code
> > > source. I
> > > > > wanted to create a mongodb connection type, but I think the best
> > > approche
> > > > > would be to refactor connections first. Thoughts anyone?"
> > > > >
> > > > > The answer of Bolke de Bruin was: "making it more generic would be
> > > > > appreciated"
> > > > >
> > > > > So basically the way the code is set up actually every types of
> > > > connection
> > > > > existing is defined though a list in the Connection class. It
> > > implements
> > > > > exactly the same code for parsing uri to get connections info and
> > > doesn't
> > > > > allow for a simple way to get back the uri from the connection
> infos.
> > > > >
> > > > > I need to add a mongodb connection and a way to get it back as a
> uri,
> > > so
> > > > i
> > > > > could use an other type of connection and play around with that or
> > > juste
> > > > > add one more hard coded connection type, but I though this might
be
> > > > > something that comes back regularly and having a simple way to plug
> > in
> > > > new
> > > > > types of connection would make it easier for anyone to contribute
a
> > new
> > > > > connection type.
> > > > >
> > > > > Hope this clarifies my proposal.
> > > > >
> > > > > Le lun. 9 janv. 2017 à 12:46, Alex Van Boxel <alex@vanboxel.be>
a
> > > écrit
> > > > :
> > > > >
> > > > > > Hey Gael,
> > > > > >
> > > > > > could you please recap the question here and provide some
> context.
> > > Not
> > > > > > everyone on the mailinglist is actively following Gitter,
> including
> > > me.
> > > > > > With some context it would be easier to give feedback. Thanks.
> > > > > >
> > > > > > On Mon, Jan 9, 2017 at 11:15 AM Gael Magnan <
> gaelmagnan@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > following my question on gitter the other day and the response
> > from
> > > > > Bolke
> > > > > > > de Bruin, I've started working on refactoring the connections
> in
> > > > > airflow.
> > > > > > >
> > > > > > > Before submitting a PR I wanted to share my proposal with
you
> and
> > > get
> > > > > > > feedbacks.
> > > > > > >
> > > > > > > The idea is quite simple, I've divided the Connection class
in
> > two,
> > > > > > > Connection and ConnectionType, connection has the same
> interface
> > it
> > > > had
> > > > > > > before plus a few methods, but the class keeps a reference
to a
> > > > > > dictionary
> > > > > > > of registered ConnectionType. It delegates the work of
parsing
> > from
> > > > > URI,
> > > > > > > formatting to URI (added) and getting the hook to the
> > > ConnectionType
> > > > > > > associated with the conn_type.
> > > > > > >
> > > > > > > I've thought of two ways of registering new ConnectionTypes,
> the
> > > > first
> > > > > is
> > > > > > > making the BaseConnectionType use a metaclass that registered
> any
> > > new
> > > > > > > ConnectionType with Connection when the class is declared,
it
> > would
> > > > > > require
> > > > > > > the less work to extend the connection module, as just
> importing
> > > the
> > > > > file
> > > > > > > with the connection would do the trick.
> > > > > > > The second one is juste to have a function/classmethod
that you
> > > call
> > > > > > > manually to register your connection. It would be simpler
to
> > > > understand
> > > > > > but
> > > > > > > requires more work every time you create a new ConnectionType.
> > > > > > >
> > > > > > > Hope this proposal is clear enough, and I'm waiting for
> feebacks
> > > and
> > > > > > > possible improvements.
> > > > > > >
> > > > > > > Regards
> > > > > > > Gael Magnan de Bornier
> > > > > > >
> > > > > > --
> > > > > >   _/
> > > > > > _/ Alex Van Boxel
> > > > > >
> > > > >
> > > > --
> > > >   _/
> > > > _/ Alex Van Boxel
> > > >
> > >
> > --
> >   _/
> > _/ Alex Van Boxel
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message