airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Pamplona <kpamp...@justin.tv>
Subject Re: Custom DagBag collection based on dags_folder prefix
Date Thu, 15 Mar 2018 23:42:37 GMT
I'd also definitely be interested in this, as we have a asyn cron job that
syncs with a remote S3 location. I'd also be happy to help tackle some of
this work if there's a ticket involved.


On Thu, Mar 15, 2018 at 4:38 PM, Joy Gao <joyg@wepay.com> wrote:

> Hi guys,
>
> A related topic has been discussed recently via a separate email thread
> (see 'How to add hooks for strong deployment consistency?
> <https://lists.apache.org/thread.html/%3CCAB=
> riaAXKSkEa4A7vx0MzpP7jsh0kktP0WhZKWgwdD1vr2sQtw@mail.gmail.com%3E>
> ')
>
> The idea brought up by Maxime is to modify DagBag and implement a
> DagFetcher abstraction, where the default is "FileSystemDagFetcher", but it
> open up doors for "GitRepoDagFetcher", "ArtifactoryDagFetcher",
> "TarballInS3DagFetcher", or in this case, "HDFSDagFetcher", "S3DagFetcher",
> and "GCSDagFetcher".
>
> We are all in favor of this, but as far as I'm aware no one has owned this
> yet. So if you (or anyone) wants to work on this, please create a JIRA and
> call it out :)
>
> Cheers,
> Joy
>
>
>
> On Thu, Mar 15, 2018 at 3:54 PM, Chris Fei <cfei18@gmail.com> wrote:
>
> > Hi Diogo,
> >
> > This would be valuable for me as well, I'd love first-class support for
> > hdfs://..., s3://..., gcs://..., etc as a value for dags_folder. As a
> > workaround, I deploy a maintenance DAG that periodically downloads other
> > DAGs from GCS into my DAG folder. Not perfect, but gets the job done.
> > Chris
> >
> > On Thu, Mar 15, 2018, at 6:32 PM, Diogo Franco wrote:
> > > Hi all,
> > >
> > > I think that the ability to fill up the DagBag from remote
> > > locations would> be useful (in my use case, having the dags folder in
> > HDFS would
> > > greatly> simplify the release process).
> > >
> > > Was there any discussion on this previously? I looked around
> > > briefly but> couldn't find it.
> > >
> > > Maybe the method **DagBag.collect_dags** in *airflow/models.py *could>
> > delegate the walking part to specific methods based on the
> > > *dags_folder *prefix,
> > > in a sort of plugin architecture. This would allow the
> > > dags_folder to be> defined like hdfs://namenode/user/airflow/dags, or
> > s3://...
> > >
> > > If this makes sense, I'd love to work on it.
> > >
> > > Cheers,
> > > Diogo Franco
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message