airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxime Beauchemin <maximebeauche...@gmail.com>
Subject Re: Per-task resources with Mesos
Date Fri, 04 Aug 2017 21:48:49 GMT
At Airbnb using the Celery executor we use queues to wire tasks to machines
provisioned in specific ways and we use the cgroup feature to constrain
resource utilization as we fire up tasks. That requires running the worker
service as root as its a requirement to impersonate and use cgroups.

In the context of Mesos things may be different as you may want to do that
on a different layer. I'd read through the MesosExecutor to see if it does
any of this already, or to figure out where you may be able to hook things
up.

Note that (from memory) the MesosExecutor relies on pickling to get
serialized DAGs [through the database] to Mesos slots, and that chances are
high that we may deprecate that feature in the future. By that time we'll
probably have a "DagFetcher" abstraction, allowing to get the DAG
definition in another way on the fly.

Max

On Thu, Aug 3, 2017 at 10:24 AM, Victor Monteiro <victor.monteiro@ubee.in>
wrote:

> Hi Stefano, have you read about queues? Airflow has this concept and I
> think you can decide for which queue a task should go. By doing this and
> integrating it with mesos, I believe you can make a mesos cluster with more
> resources to get tasks from a certain queue specific for heavy
> computations.
>
> Maybe this can solve your problem (not sure) :D
>
> 2017-08-03 4:34 GMT-03:00 Stefano Baghino <stefano.baghino@teralytics.ch>:
>
> > Hi everyone,
> >
> > I'm investigating the possibility for our organization to use Airflow for
> > workflow management.
> >
> > Some requirements on our side regard resource management, and in
> particular
> > the possibility for the system to run tasks on top of Apache Mesos.
> Airflow
> > partially satisfies our requirements in that regard, meaning that after
> > having a look at the docs and code, it appears to me (correct me if I'm
> > wrong) that resources are determined for the whole system (via
> > configuration) and cannot be determined on a per-task basis. We'd need
> this
> > because some of our jobs are quite lightweight while others may require a
> > lot of resources, making it a "one-size-fits-all" configuration quite
> > wasteful.
> >
> > I had a look at the AirflowMesosScheduler and MesosExecutor and thought
> it
> > would be nice to add this feature and perhaps I can add it myself. What I
> > would need is some guidance on how to make this fit into the overall
> system
> > design: is there an established way to explicitly ask for resources for a
> > specific task in the DAG? If not, what could be a possible way to
> introduce
> > it? And if this reveals itself to be outside of the scope of Airflow, how
> > do you think I can make it meet our requirement?
> >
> > Thanks in advance.
> >
> > P.S.: if by any chance some of you are on the Mesos mailing list as well,
> > you may know that I'm having issues in making Airflow run successfully
> > using Mesos due to missing Python packages. I'm not sure whether this
> > mailing list is an appropriate place for users to get help. If so, I
> could
> > probably share that post here as well. Thanks!
> >
> > --
> > Stefano Baghino | TERALYTICS
> > *software engineer*
> >
> > Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
> > phone: +41 43 508 24 57
> > email: stefano.baghino@teralytics.ch
> > www.teralytics.net
> >
> > Company registration number: CH-020.3.037.709-7 | Trade register Canton
> > Zurich
> > Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz,
> Yann
> > de Vries
> >
> > This e-mail message contains confidential information which is for the
> sole
> > attention and use of the intended recipient. Please notify us at once if
> > you think that it may not be intended for you and delete it immediately.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message