airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tao Feng <fengta...@gmail.com>
Subject Re: Custom DagBag collection based on dags_folder prefix
Date Sat, 17 Mar 2018 00:02:42 GMT
+1.

I think having a design doc is good. And it would be great if you could
create couples of small jiras for the related task. I am interested in
helping out if possible.

Thanks,
-Tao

On Fri, Mar 16, 2018 at 3:52 AM, Diogo Franco <diogoalexfranco@gmail.com>
wrote:

> Created this JIRA <https://issues.apache.org/jira/browse/AIRFLOW-2221>.
>
> I'm happy to take a shot at this with an initial implementation for review,
> but If it is preferred to start with a design doc or something, let me
> know.
>
> Thank you for the guidance, cheers,
> Diogo
>
> On 16 March 2018 at 00:08, Maxime Beauchemin <maximebeauchemin@gmail.com>
> wrote:
>
> > I'm happy to commit to provide guidance and review the code if someone
> > wants to work on this feature.
> >
> > Max
> >
> > On Thu, Mar 15, 2018 at 4:42 PM, Kevin Pamplona <kpamplon@justin.tv>
> > wrote:
> >
> > > I'd also definitely be interested in this, as we have a asyn cron job
> > that
> > > syncs with a remote S3 location. I'd also be happy to help tackle some
> of
> > > this work if there's a ticket involved.
> > >
> > >
> > > On Thu, Mar 15, 2018 at 4:38 PM, Joy Gao <joyg@wepay.com> wrote:
> > >
> > > > Hi guys,
> > > >
> > > > A related topic has been discussed recently via a separate email
> thread
> > > > (see 'How to add hooks for strong deployment consistency?
> > > > <https://lists.apache.org/thread.html/%3CCAB=
> > > > riaAXKSkEa4A7vx0MzpP7jsh0kktP0WhZKWgwdD1vr2sQtw@mail.gmail.com%3E>
> > > > ')
> > > >
> > > > The idea brought up by Maxime is to modify DagBag and implement a
> > > > DagFetcher abstraction, where the default is "FileSystemDagFetcher",
> > but
> > > it
> > > > open up doors for "GitRepoDagFetcher", "ArtifactoryDagFetcher",
> > > > "TarballInS3DagFetcher", or in this case, "HDFSDagFetcher",
> > > "S3DagFetcher",
> > > > and "GCSDagFetcher".
> > > >
> > > > We are all in favor of this, but as far as I'm aware no one has owned
> > > this
> > > > yet. So if you (or anyone) wants to work on this, please create a
> JIRA
> > > and
> > > > call it out :)
> > > >
> > > > Cheers,
> > > > Joy
> > > >
> > > >
> > > >
> > > > On Thu, Mar 15, 2018 at 3:54 PM, Chris Fei <cfei18@gmail.com> wrote:
> > > >
> > > > > Hi Diogo,
> > > > >
> > > > > This would be valuable for me as well, I'd love first-class support
> > for
> > > > > hdfs://..., s3://..., gcs://..., etc as a value for dags_folder.
> As a
> > > > > workaround, I deploy a maintenance DAG that periodically downloads
> > > other
> > > > > DAGs from GCS into my DAG folder. Not perfect, but gets the job
> done.
> > > > > Chris
> > > > >
> > > > > On Thu, Mar 15, 2018, at 6:32 PM, Diogo Franco wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > I think that the ability to fill up the DagBag from remote
> > > > > > locations would> be useful (in my use case, having the dags
> folder
> > in
> > > > > HDFS would
> > > > > > greatly> simplify the release process).
> > > > > >
> > > > > > Was there any discussion on this previously? I looked around
> > > > > > briefly but> couldn't find it.
> > > > > >
> > > > > > Maybe the method **DagBag.collect_dags** in *airflow/models.py
> > > *could>
> > > > > delegate the walking part to specific methods based on the
> > > > > > *dags_folder *prefix,
> > > > > > in a sort of plugin architecture. This would allow the
> > > > > > dags_folder to be> defined like hdfs://namenode/user/airflow/
> dags,
> > > or
> > > > > s3://...
> > > > > >
> > > > > > If this makes sense, I'd love to work on it.
> > > > > >
> > > > > > Cheers,
> > > > > > Diogo Franco
> > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message