airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gael Magnan <gaelmag...@gmail.com>
Subject Re: Refactoring Connection
Date Wed, 11 Jan 2017 14:48:51 GMT
Thanks for all the information,

currently what I have is working iso with what was existing, just need to
add some doc and tests, but I have a few questions.

Currently airflow/urtils/db.py initdb function creates a lot of connections
by default.
I understand that they are used for the tests, but why are they created
even on production environment?
Looking into that made me realize that we can create connections of a type
that is not defined, (ex: beeline, aws, ...) by code just like you said but
it's to the best of my knowledge impossible with the UI.

By the way, should I create the beeline and aws connection types? So
everyone can create them through the UI?

I've created a JIRA issue:https://issues.apache.org/jira/browse/AIRFLOW-235,
and a first draft of the PR:
https://github.com/apache/incubator-airflow/pull/1981, more tests and doc
coming + all the changes depending on where this discussion takes us.


Le mar. 10 janv. 2017 à 21:55, Maxime Beauchemin <maximebeauchemin@gmail.com>
a écrit :

> Note that the `Conection.extra` attribute was designed to store anything
> that is specific to a connection type. Here's an example of how it's used
> in the HiveHook:
>
> https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/hive_hooks.py#L75
>
> Also note that currently there's no enforcement of connection type that
> match a hook, so you can use any connection with any hook. So you could set
> a new connection for mongo with no type at all and still use your local
> MongoHook without changing the Airflow source code.
>
> The one part that is missing is the association from the connection to the
> hook here in Connection.get_hook
>
> https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L620
> This allows getting the hook from the connection, and allows for the Data
> Profiling section of the UI to know how to execute a query against that
> connection type.
>
> Max
>
> On Tue, Jan 10, 2017 at 2:58 AM, Gael Magnan <gaelmagnan@gmail.com> wrote:
>
> > @George Leslie-Waksman:
> >   It could be done, but since it's marked as deprecated everywhere and
> Alex
> > Van Boxel asked me tu used entry points I'm not sure we should do both.
> >
> > @Jeremiah Lowin:
> >   I was thinking not changing much of the api, juste the implementation
> > behind it.
> >   So for example, adding the mongodb connection type would just be doing
> > the following (assuming we have a MongodbHook):
> >   ```
> >   from airflow.models import BaseConnectionType
> >
> >   class MongodbConnectionType(BaseConnectionType):
> >       name = 'mongodb'
> >       description = 'MongoDB'
> >
> >       def get_hook(conn_id):
> >           try:
> >               from airflow.hooks.mongodb_hook import MongodbHook
> >               return MongodbHook(conn_id=conn_id)
> >           except:
> >               pass
> >   ```
> >
> >   and if we want to handle the query params (and we do) we could just do
> > something like overriding the parse_from_uri method (not final code):
> >   ```
> >       @classmethod
> >       def parse_from_uri(cls, uri):
> >           temp_uri = urlparse(uri)
> >           (host, schema, login, password, port,
> >             extras) = super().parse_from_uri(cls, uri)
> >           extras = dict([param.split('=', 1) for param in
> > temp_uri.query.split('&')])
> >           return host, schema, login, password, port, extras
> >   ```
> >
> >   So if you defined a connection using an uri the way you are able now,
> and
> > this URI started with mongodb,
> >   it would be extracted and a connection created.
> >
> >
> > @Maxime Beauchemin:
> >   By can't easily I meant that to had a connection type you have to go
> > through making a pr to airflow, you can't just use a plugin.
> >   I think all you have to do to add a new one is modify the Connection
> > class in models. But it's still hard coded.
> >
> >   I agree with you, we could easily use just a data structure for most of
> > the connections, especially the ones existing now,
> >   but it would make it harder to add a connection whose uri format is not
> > consistant with the other ones (not to mention ones that don't use uri),
> >   except if we also handle a string format that would parse the uri and
> set
> > the right fields.
> >
> > Le mar. 10 janv. 2017 à 01:35, Maxime Beauchemin <
> > maximebeauchemin@gmail.com>
> > a écrit :
> >
> > > About *"you can't **easily add a connection type without modifying the
> > > airflow code source*", can we start by listing out the places that need
> > to
> > > be touched up when adding a connection?
> > >
> > > Here's what I could find:
> > > https://github.com/apache/incubator-airflow/blob/master/
> > > airflow/models.py#L516
> > > https://github.com/apache/incubator-airflow/blob/master/
> > > airflow/utils/db.py#L109
> > > https://github.com/apache/incubator-airflow/blob/master/
> > > airflow/models.py#L620
> > >
> > > I didn't dig super deep but it seems like what we need is a
> configurable
> > > connection configuration blob that informs the connection type name,
> the
> > > verbose name and the related hook location.
> > >
> > > Maybe a solution is to have a `CONNECTION_TYPES` object in settings.py
> > that
> > > could be altered in different environments. This CONNECTION_TYPE would
> > > store the name, the verbose_name and the related hook's path as a
> string
> > > `airflow.hooks.mysql_hook.MySqlHook` or something like that. It may not
> > be
> > > that far from what you were proposing, and whether it should be a class
> > or
> > > a data structure is arguable, but I'd like to keep configuration
> elements
> > > easily serializable so I'd vote for a data structure using only
> > primitives.
> > >
> > > Max
> > >
> > > On Mon, Jan 9, 2017 at 5:13 AM, Alex Van Boxel <alex@vanboxel.be>
> wrote:
> > >
> > > > I was actually going to propose something different with
> entry-points,
> > > but
> > > > your requirement beat me to it (but that's ok :-). Actually I think
> > with
> > > > this mechanism people would be able to extend Airflow connection
> > > mechanism
> > > > (and later other stuff) by doing *pip install
> > > airflow-sexy-new-connection*
> > > > (for example).
> > > >
> > > > On Mon, Jan 9, 2017 at 1:39 PM Gael Magnan <gaelmagnan@gmail.com>
> > wrote:
> > > >
> > > > > Thank you for the read, I'm gonna look at it, it's probably gonna
> be
> > > > better
> > > > > that what I have.
> > > > >
> > > > > Point taken about the URI, I'll see if i can find something generic
> > > > enough
> > > > > to handle all those cases.
> > > > >
> > > > > Le lun. 9 janv. 2017 à 13:36, Alex Van Boxel <alex@vanboxel.be>
a
> > > écrit
> > > > :
> > > > >
> > > > > > Thanks a lot, yes it clarifies a lot and I do agree you really
> need
> > > to
> > > > > hack
> > > > > > inside Airflow to add a Connection type. While you're working
at
> > this
> > > > > could
> > > > > > you have a look at the standard python *entry-point mechanism*
> for
> > > > > > registering Connection types/components.
> > > > > >
> > > > > > A good read on this:
> > > > > >
> > > > > >
> > > > > http://docs.pylonsproject.org/projects/pylons-webframework/
> > > > en/latest/advanced_pylons/entry_points_and_plugins.html
> > > > > >
> > > > > > My first though would be that just by adding an entry to the
> > factory
> > > > > method
> > > > > > would be enough to register your Connection + ConnectionType
and
> > UI.
> > > > > >
> > > > > > Also note that not everything works with a URI. The Google Cloud
> > > > > Connection
> > > > > > doesn't have one, it uses a secret key file stored on disk,
so
> > don't
> > > > > force
> > > > > > every connection type to work with URI's.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Jan 9, 2017 at 1:15 PM Gael Magnan <gaelmagnan@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Yes sure,
> > > > > > >
> > > > > > > The question was the following:
> > > > > > > "I was looking at the code of the connections, and I realized
> you
> > > > can't
> > > > > > > easily add a connection type without modifying the airflow
code
> > > > > source. I
> > > > > > > wanted to create a mongodb connection type, but I think
the
> best
> > > > > approche
> > > > > > > would be to refactor connections first. Thoughts anyone?"
> > > > > > >
> > > > > > > The answer of Bolke de Bruin was: "making it more generic
would
> > be
> > > > > > > appreciated"
> > > > > > >
> > > > > > > So basically the way the code is set up actually every
types of
> > > > > > connection
> > > > > > > existing is defined though a list in the Connection class.
It
> > > > > implements
> > > > > > > exactly the same code for parsing uri to get connections
info
> and
> > > > > doesn't
> > > > > > > allow for a simple way to get back the uri from the connection
> > > infos.
> > > > > > >
> > > > > > > I need to add a mongodb connection and a way to get it
back as
> a
> > > uri,
> > > > > so
> > > > > > i
> > > > > > > could use an other type of connection and play around with
that
> > or
> > > > > juste
> > > > > > > add one more hard coded connection type, but I though this
> might
> > be
> > > > > > > something that comes back regularly and having a simple
way to
> > plug
> > > > in
> > > > > > new
> > > > > > > types of connection would make it easier for anyone to
> > contribute a
> > > > new
> > > > > > > connection type.
> > > > > > >
> > > > > > > Hope this clarifies my proposal.
> > > > > > >
> > > > > > > Le lun. 9 janv. 2017 à 12:46, Alex Van Boxel <alex@vanboxel.be
> >
> > a
> > > > > écrit
> > > > > > :
> > > > > > >
> > > > > > > > Hey Gael,
> > > > > > > >
> > > > > > > > could you please recap the question here and provide
some
> > > context.
> > > > > Not
> > > > > > > > everyone on the mailinglist is actively following
Gitter,
> > > including
> > > > > me.
> > > > > > > > With some context it would be easier to give feedback.
> Thanks.
> > > > > > > >
> > > > > > > > On Mon, Jan 9, 2017 at 11:15 AM Gael Magnan <
> > > gaelmagnan@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > following my question on gitter the other day
and the
> > response
> > > > from
> > > > > > > Bolke
> > > > > > > > > de Bruin, I've started working on refactoring
the
> connections
> > > in
> > > > > > > airflow.
> > > > > > > > >
> > > > > > > > > Before submitting a PR I wanted to share my proposal
with
> you
> > > and
> > > > > get
> > > > > > > > > feedbacks.
> > > > > > > > >
> > > > > > > > > The idea is quite simple, I've divided the Connection
class
> > in
> > > > two,
> > > > > > > > > Connection and ConnectionType, connection has
the same
> > > interface
> > > > it
> > > > > > had
> > > > > > > > > before plus a few methods, but the class keeps
a reference
> > to a
> > > > > > > > dictionary
> > > > > > > > > of registered ConnectionType. It delegates the
work of
> > parsing
> > > > from
> > > > > > > URI,
> > > > > > > > > formatting to URI (added) and getting the hook
to the
> > > > > ConnectionType
> > > > > > > > > associated with the conn_type.
> > > > > > > > >
> > > > > > > > > I've thought of two ways of registering new
> ConnectionTypes,
> > > the
> > > > > > first
> > > > > > > is
> > > > > > > > > making the BaseConnectionType use a metaclass
that
> registered
> > > any
> > > > > new
> > > > > > > > > ConnectionType with Connection when the class
is declared,
> it
> > > > would
> > > > > > > > require
> > > > > > > > > the less work to extend the connection module,
as just
> > > importing
> > > > > the
> > > > > > > file
> > > > > > > > > with the connection would do the trick.
> > > > > > > > > The second one is juste to have a function/classmethod
that
> > you
> > > > > call
> > > > > > > > > manually to register your connection. It would
be simpler
> to
> > > > > > understand
> > > > > > > > but
> > > > > > > > > requires more work every time you create a new
> > ConnectionType.
> > > > > > > > >
> > > > > > > > > Hope this proposal is clear enough, and I'm waiting
for
> > > feebacks
> > > > > and
> > > > > > > > > possible improvements.
> > > > > > > > >
> > > > > > > > > Regards
> > > > > > > > > Gael Magnan de Bornier
> > > > > > > > >
> > > > > > > > --
> > > > > > > >   _/
> > > > > > > > _/ Alex Van Boxel
> > > > > > > >
> > > > > > >
> > > > > > --
> > > > > >   _/
> > > > > > _/ Alex Van Boxel
> > > > > >
> > > > >
> > > > --
> > > >   _/
> > > > _/ Alex Van Boxel
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message