Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E7FCD200CDA for ; Fri, 4 Aug 2017 23:48:56 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E65C016E55D; Fri, 4 Aug 2017 21:48:56 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0EE0F16E55C for ; Fri, 4 Aug 2017 23:48:55 +0200 (CEST) Received: (qmail 4184 invoked by uid 500); 4 Aug 2017 21:48:54 -0000 Mailing-List: contact dev-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list dev@airflow.incubator.apache.org Received: (qmail 4172 invoked by uid 99); 4 Aug 2017 21:48:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Aug 2017 21:48:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 65659C2CFB for ; Fri, 4 Aug 2017 21:48:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id mheqXq_uBb3b for ; Fri, 4 Aug 2017 21:48:52 +0000 (UTC) Received: from mail-wm0-f50.google.com (mail-wm0-f50.google.com [74.125.82.50]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 895115F2EC for ; Fri, 4 Aug 2017 21:48:51 +0000 (UTC) Received: by mail-wm0-f50.google.com with SMTP id t201so29220528wmt.0 for ; Fri, 04 Aug 2017 14:48:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=KStWvjhDnl6r5fmOckrsflT/v/tiH+Y38xn8AAES+CU=; b=R1VJq52oob3LgtJzvJrue8Q0ywbFI8u+LdPrMwul/DSXxM1oVE9eBWWJ41R3Mzs7xm W5EaaeRErCrotKEe1Qz18n1vNCsbIhyGlzyQziY1ICE+RpojUdEd5lP9V5fhYnIa5WCH 2acQ5O+bPbkW1gjppzfxEsMYbMhQjdxoE6EYE+YRRlJBtLXy2TmF0Y5RE4hsax0VMIJP wWewQwf/Fs9KtGMAH9wzZuDut2SXAMbFpMasz/t8btHnWCdYNdEShdxfKs1fA2Puq5Ni IVOZZA9dxqh/1DifLBs29jlx7cOeNMIPOL+0dfaCllKkycwaHgnhW6cy1TE072quOlab SKkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=KStWvjhDnl6r5fmOckrsflT/v/tiH+Y38xn8AAES+CU=; b=H2PvPNJU5gutlyTO2EHU0xwKxAOK3BInH6g25vVPeYrh5sNgY020H/21jiCSAGiRDe A3OWwb5Ks6JM/Us+Tmf16VHcy7PnPk74aTlYBJ8lgXyV0tXOQw9pQdFRwaohNloG2WgA sqcuxCfa0kU7N2O2syMEM9PGfxTLYKpRJUZv8K04936+49oEoGaKnHGFYnh5nyyvQBMu CwptYxPAIXTGYmB+y6iwREYLRNCTOhQ+I9rTiKERwCdBPj7sCNCRrsEbcteeqd4/rpb0 AX9j+rW+xPUxz8T+gT6FLvDneMIN1UmAZQOuLEIvaTW7zFdI5b61mc7gzPEeo+hai3A3 NKmg== X-Gm-Message-State: AHYfb5i8bjFLKlkIsC1vYuzp/jp4q5nm9PbYCDxGl8iyM+kTymNKiSA2 BRnQQn91smeWlqMTwPLZ88C4twDYxw== X-Received: by 10.28.226.137 with SMTP id z131mr1982774wmg.151.1501883330283; Fri, 04 Aug 2017 14:48:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.158.193 with HTTP; Fri, 4 Aug 2017 14:48:49 -0700 (PDT) In-Reply-To: References: From: Maxime Beauchemin Date: Fri, 4 Aug 2017 14:48:49 -0700 Message-ID: Subject: Re: Per-task resources with Mesos To: dev@airflow.incubator.apache.org Content-Type: multipart/alternative; boundary="001a114b0b40654e850555f47645" archived-at: Fri, 04 Aug 2017 21:48:57 -0000 --001a114b0b40654e850555f47645 Content-Type: text/plain; charset="UTF-8" At Airbnb using the Celery executor we use queues to wire tasks to machines provisioned in specific ways and we use the cgroup feature to constrain resource utilization as we fire up tasks. That requires running the worker service as root as its a requirement to impersonate and use cgroups. In the context of Mesos things may be different as you may want to do that on a different layer. I'd read through the MesosExecutor to see if it does any of this already, or to figure out where you may be able to hook things up. Note that (from memory) the MesosExecutor relies on pickling to get serialized DAGs [through the database] to Mesos slots, and that chances are high that we may deprecate that feature in the future. By that time we'll probably have a "DagFetcher" abstraction, allowing to get the DAG definition in another way on the fly. Max On Thu, Aug 3, 2017 at 10:24 AM, Victor Monteiro wrote: > Hi Stefano, have you read about queues? Airflow has this concept and I > think you can decide for which queue a task should go. By doing this and > integrating it with mesos, I believe you can make a mesos cluster with more > resources to get tasks from a certain queue specific for heavy > computations. > > Maybe this can solve your problem (not sure) :D > > 2017-08-03 4:34 GMT-03:00 Stefano Baghino : > > > Hi everyone, > > > > I'm investigating the possibility for our organization to use Airflow for > > workflow management. > > > > Some requirements on our side regard resource management, and in > particular > > the possibility for the system to run tasks on top of Apache Mesos. > Airflow > > partially satisfies our requirements in that regard, meaning that after > > having a look at the docs and code, it appears to me (correct me if I'm > > wrong) that resources are determined for the whole system (via > > configuration) and cannot be determined on a per-task basis. We'd need > this > > because some of our jobs are quite lightweight while others may require a > > lot of resources, making it a "one-size-fits-all" configuration quite > > wasteful. > > > > I had a look at the AirflowMesosScheduler and MesosExecutor and thought > it > > would be nice to add this feature and perhaps I can add it myself. What I > > would need is some guidance on how to make this fit into the overall > system > > design: is there an established way to explicitly ask for resources for a > > specific task in the DAG? If not, what could be a possible way to > introduce > > it? And if this reveals itself to be outside of the scope of Airflow, how > > do you think I can make it meet our requirement? > > > > Thanks in advance. > > > > P.S.: if by any chance some of you are on the Mesos mailing list as well, > > you may know that I'm having issues in making Airflow run successfully > > using Mesos due to missing Python packages. I'm not sure whether this > > mailing list is an appropriate place for users to get help. If so, I > could > > probably share that post here as well. Thanks! > > > > -- > > Stefano Baghino | TERALYTICS > > *software engineer* > > > > Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland > > phone: +41 43 508 24 57 > > email: stefano.baghino@teralytics.ch > > www.teralytics.net > > > > Company registration number: CH-020.3.037.709-7 | Trade register Canton > > Zurich > > Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, > Yann > > de Vries > > > > This e-mail message contains confidential information which is for the > sole > > attention and use of the intended recipient. Please notify us at once if > > you think that it may not be intended for you and delete it immediately. > > > --001a114b0b40654e850555f47645--