airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dennis O'Brien" <den...@dennisobrien.net>
Subject Re: best way to handle version upgrades of libraries used by tasks
Date Mon, 05 Feb 2018 07:17:10 GMT
Thanks for the input!  I'll take a look at using queues for this.

thanks,
Dennis

On Tue, Jan 30, 2018 at 4:17 PM Hbw <brian@heisenbergwoodworking.com> wrote:

> Run them on different workers by using queues?
> That way different workers can have different 3rd party libs while sharing
> the same af core.
>
> B
>
> Sent from a device with less than stellar autocorrect
>
> > On Jan 30, 2018, at 9:13 AM, Dennis O'Brien <dennis@dennisobrien.net>
> wrote:
> >
> > Hi All,
> >
> > I have a number of jobs that use scikit-learn for scoring players.
> > Occasionally I need to upgrade scikit-learn to take advantage of some new
> > features.  We have a single conda environment that specifies all the
> > dependencies for Airflow as well as for all of our DAGs.  So currently
> > upgrading scikit-learn means upgrading it for all DAGs that use it, and
> > retraining all models for that version.  It becomes a very involved task
> > and I'm hoping to find a better way.
> >
> > One option is to use BashOperator (or something that wraps BashOperator)
> > and have bash use a specific conda environment with that version of
> > scikit-learn.  While simple, I don't like the idea of limiting task input
> > to the command line.  Still, an option.
> >
> > Another option is the DockerOperator.  But when I asked around at a
> > previous Airflow Meetup, I couldn't find anyone actually using it.  It
> also
> > adds some complexity to the build and deploy process in that now I have
> to
> > maintain docker images for all my environments.  Still, not ruling it
> out.
> >
> > And the last option I can think of is just heterogeneous workers.  We are
> > migrating our Airflow infrastructure to AWS ECS (from EC2) and plan on
> > having support for separate worker clusters, so this could include
> workers
> > with different conda environments.  I assume as long as a few key
> packages
> > are identical between scheduler and worker instances (airflow, redis,
> > celery?) the rest can be whatever.
> >
> > Has anyone faced this problem and have some advice?  Am I missing any
> > simpler options?  Any thoughts much appreciated.
> >
> > thanks,
> > Dennis
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message