Return-Path: X-Original-To: apmail-aurora-dev-archive@minotaur.apache.org Delivered-To: apmail-aurora-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3298318FDD for ; Tue, 24 Nov 2015 20:15:08 +0000 (UTC) Received: (qmail 20774 invoked by uid 500); 24 Nov 2015 20:15:08 -0000 Delivered-To: apmail-aurora-dev-archive@aurora.apache.org Received: (qmail 20723 invoked by uid 500); 24 Nov 2015 20:15:08 -0000 Mailing-List: contact dev-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@aurora.apache.org Delivered-To: mailing list dev@aurora.apache.org Received: (qmail 20712 invoked by uid 99); 24 Nov 2015 20:15:07 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Nov 2015 20:15:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 507141A0B1E for ; Tue, 24 Nov 2015 20:15:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=6.31 tests=[SPF_HELO_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id gxVHvQmjVRMp for ; Tue, 24 Nov 2015 20:14:54 +0000 (UTC) Received: from emea01-am1-obe.outbound.protection.outlook.com (mail-am1on0086.outbound.protection.outlook.com [157.56.112.86]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 7AF2C232F0 for ; Tue, 24 Nov 2015 20:14:53 +0000 (UTC) Received: from VI1PR06CA0072.eurprd06.prod.outlook.com (10.163.160.40) by AM3PR06MB1412.eurprd06.prod.outlook.com (10.163.187.22) with Microsoft SMTP Server (TLS) id 15.1.331.20; Tue, 24 Nov 2015 20:14:51 +0000 Received: from AM1FFO11OLC005.protection.gbl (2a01:111:f400:7e00::142) by VI1PR06CA0072.outlook.office365.com (2a01:111:e400:533c::40) with Microsoft SMTP Server (TLS) id 15.1.331.20 via Frontend Transport; Tue, 24 Nov 2015 20:14:51 +0000 Authentication-Results: spf=pass (sender IP is 188.184.36.50) smtp.mailfrom=cern.ch; aurora.apache.org; dkim=none (message not signed) header.d=none;aurora.apache.org; dmarc=bestguesspass action=none header.from=cern.ch; Received-SPF: Pass (protection.outlook.com: domain of cern.ch designates 188.184.36.50 as permitted sender) receiver=protection.outlook.com; client-ip=188.184.36.50; helo=CERNMX11.cern.ch; Received: from CERNMX11.cern.ch (188.184.36.50) by AM1FFO11OLC005.mail.protection.outlook.com (10.174.64.132) with Microsoft SMTP Server (TLS) id 15.1.331.11 via Frontend Transport; Tue, 24 Nov 2015 20:14:50 +0000 Received: from cernfe03.cern.ch (188.184.36.39) by cernmxgwlb4.cern.ch (188.184.36.50) with Microsoft SMTP Server (TLS) id 14.3.158.1; Tue, 24 Nov 2015 21:14:50 +0100 Received: from pcatd38.cern.ch (137.138.92.251) by smtp.cern.ch (188.184.36.52) with Microsoft SMTP Server (TLS) id 14.3.174.1; Tue, 24 Nov 2015 21:14:49 +0100 Subject: Re: Questions about Aurora scheduling policy To: References: <5653BEAD.1060005@cern.ch> From: Riccardo Poggi Message-ID: <5654C539.7050004@cern.ch> Date: Tue, 24 Nov 2015 21:14:49 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [137.138.92.251] X-EOPAttributedMessage: 0 X-Microsoft-Exchange-Diagnostics: 1;AM1FFO11OLC005;1:NeWywE3B928NP53DxVo3g0iaKpeFH6MYuLdDKjpWkAQj9/LSU55FXXY+KI0SqchbR17CwHLNiiGgDEv8YEV/rm3GJzmPM+XHctF62tAx6zuovWXzqiErJKAVnq9vOT6ktjIBN78QGaDIMefskzdxvPlWrh0u47kBxld0SjDbAaM4Nw8PHDI9OQFdfamT35W2nRW5LZNtgjnmklD8XpY+g2eQpcovVnqpJ/RhqKj3bO55dP4082WylfI5FFmAECrUx3tV9cJdKX3ynT2vSsQMe4Lw1Yt7P/C2aYihEgcdsNne2q/NYaPfrVptB6aO2+Dm18ZKYJ2gr7d8D6C8ZaaOpcZL9ivQd+Ca3CKr6DemepjE4z4wtQ0ZwYFpRLQr8Q3y8yVldko4RQGyo+341X73PC9byRRRy4oRFofIaySG+Rs= X-Forefront-Antispam-Report: CIP:188.184.36.50;CTRY:CH;IPV:CAL;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(10009020)(6009001)(3000300001)(438002)(124014002)(51884002)(199003)(24454002)(189002)(377454003)(164054003)(479174004)(5003600100002)(5007970100001)(6806005)(586003)(19580395003)(450100001)(23676002)(36756003)(2950100001)(64126003)(83506001)(26826002)(11100500001)(87266999)(5008740100001)(33656002)(54356999)(76176999)(92566002)(65816999)(2351001)(65956001)(66066001)(65806001)(47776003)(106466001)(59896002)(50466002)(5004730100002)(86362001)(80316001)(230700001)(4001350100001)(189998001)(5001970100001)(110136002)(87936001)(74482002)(19580405001)(6116002)(50986999)(107886002)(3846002)(16796002)(53416004);DIR:OUT;SFP:1101;SCL:1;SRVR:AM3PR06MB1412;H:CERNMX11.cern.ch;FPR:;SPF:Pass;PTR:cernmx11.cern.ch;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: 1;AM3PR06MB1412;2:1Zu/+dLSBc3stdaj49eqefQcj0DTzdDC0nM0rru8P9mHaFyUJMfZKINR5XvdEXsKRXl/vhIUxKZsZbdPKBcrL21R9iSv75M7XaKQBUSXHyXl7ym2NpMfH42i1Scs4/2cd5UMep8Grb9BB7UQ00WoZg==;3:bBuzBPB5b/pYRd+JQ/mIP/fXNwhxyjUcCVIhx5Yl2L9ZmqU9mpSHz3T20sCEtPWk4F4h6wLIIIYN1oKb6/RczjSJ9hurkYyUFy9eeZrr5buJVZ9uPatQFfDX76w4QnI8mEVIl3gsFNqZ6gZ07jj1Pdy7CT/JgZQhL1rbxMMnOaaodDH+KGnbRhrkFp1mmnmV3nf1RJlQ13txVsSJGfUw1aaWnZmKXjn8cqG1Os4M3mOu36Mj+WlswAhfMIeGESmDr8N19By6vQBI/NbFKqRGgyG5KDtRhqIip9Y52aekQU0=;25:Oja+8qudTbQtPGEp1z7d34Vm1XdLS8CopSmzCiCeTKsXAnwMXGQbZJngnYA9z8JOpriPYgarqK+Dp+7rTtIBJ41rkQ5icudVpMAdnmRc+Ae1BUYHT9x3Z6OOYb91gf1pDLLENIYRM95aMUJuSSpYto4UPNpvcGGHG+TS3rDyWJupUlI2uhfo/VOtiLXc7iRgeO3nBzgCTdPy4rJU1oiNUEJZ2UEoOfJ+RrGpLS8ttzFVtVB7JEeE3/5o3tkwPHkyHHaqOp4F07gn6YisuDRrfg== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(42139001)(8251501001);SRVR:AM3PR06MB1412; X-Microsoft-Exchange-Diagnostics: 1;AM3PR06MB1412;20:3lAlGyO4bmielizLfEoANahlHzG4ooDnK64H+7D9MkMXNizOgNcKyEZC3BCVpXPO7TLqwHOYlQj8S4jxfSdfHoYmhj5uoi5tD7lOx9lQbB3QYmBFKhU4pA1bsv3qKJc33yO5p6fP0+EL8Sen6IN4w9BqtwTr2yRTfohHbXgapuSZL8JHW03ISdes9/r3lS8VI6veFhKB3sXzkcKrQGMSA4L/qhPYA+FuGC4m7e2zdcLsMh0Br9bgHdoCWKVpB3lV/3P948YT6mJX/4moDL/2HW4JcycSow3hSxG/L2fJSEtmPM8RagcFRFMud4ZrPAEiDMapf9sCK+msjG2bbmDM4iuIsYmwzxHuanV6+xSd5stLQZJ0HkdOLws/yun7jckE/MmEaql7+fSN/yEojzD0EnTTY2ZUP8arEssUssK111L13vvSHX9MYTQdiGvlUPp0aLVeEi3s72gnjtCqtS1VhKX+871ypSk9W0IvMCxQ0tzkXkDDDTPj6IkFdhNy9roX;4:TLQ3L1fPTZSDCmgEgKxsxUvOorU7SXTuGDyBWvTqmpwtCWtJAF6CNLtjtZFSR0VzdSh1wydJWzBNYWp0XBvvKDXqYIcbNb3uZKQam28Q8oF/ccJsAiDd7DajGwZsJFEVsCe0w/iBgUBznVKX24YgthmJiNNnsU86jrcU60xx3Pr5nvJgpHOKgbQZlbep6NlogRADj3r55M93unhaMoyYxJ6vVa6OVUmqV0VHwYnLV3xNIRyOKVr43PyRXLvQFyykZwtKiUbfEuAq4PqxmBgQx9zSlEL3wg00iFBEvrCe7f20QUQoGe96cIPWXuEiVI6T2euTCAsXzdR/BKLjCLMIcqOAQxQiobQR7x/3Rk/4XMbjMxo5zL1v5HuatvTby3OrlVMVjPdwmHfiMBXOxTgt3cg2HqNVhSqm4d93xOZVgqjGlzch6sHVBAMHQwZTQBQH X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(109460225580195); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(8121501046)(520078)(5005006)(3002001)(10201501046);SRVR:AM3PR06MB1412;BCL:0;PCL:0;RULEID:;SRVR:AM3PR06MB1412; X-Forefront-PRVS: 0770F75EA9 X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtBTTNQUjA2TUIxNDEyOzIzOjh0bUMwZUxUUnRGL3NaY3ZpaXp0YUt6dmpD?= =?utf-8?B?dFVPVFo3Z0pGQXFLa0twTHZYbzh2NCt0ZmJpREd1azRsaEc1a1pkR3g4TE8z?= =?utf-8?B?VDJ5cHNrODRwc3U5SFhpRWRBdXVEcEY2OSs2cTFnekpDaWloZ0pSbVE2Vy81?= =?utf-8?B?TGp6MHBFTGIyQllNRmk0c1RHVkQ2enZGZHNYS1BkWXVBczJkSzZCR0Q4aXJP?= =?utf-8?B?blRCVlZKQS92aGE1Z25NL2pMbVVEUWE3cEpLWDY4NVB0aTRDWG9EQ2RQelFH?= =?utf-8?B?eThMU3dza0V5dFlnSGNJUk4vZnRYVmVWYVBqTlNmWmlnblJ3QS9JaU5KN1Vh?= =?utf-8?B?YURIWWJ4eVRxRUxMNE16Y0h6cjhjU0dIUVgydytveXg4cTlSalFIYUptTGNF?= =?utf-8?B?YkhZZjlqN3hhTEk2bnhmL1kyWE5QSCtucElqdUEzZldRU0ZUOU0weUQ1bnd2?= =?utf-8?B?MUlUb0RGS1VYaEJ2TDlKb2xPTDlMQ2d5SDl3V2xieDhYRkJZa2tDYkw4b1NK?= =?utf-8?B?RHMwaWlqSktaWXVMd2ZDU2xzSWVVRTh0alVjR0NlblJrODhYbll2N0RXMzBM?= =?utf-8?B?dGZJVmpZQ3k4dmRJNDlHY3JNMjJLRUtZeW5aOERQd2d6aS9QWnVHY2dwQ01B?= =?utf-8?B?YnNOMUY4NWI0d3J1cFI0SjR0ZHlPTE5qcm52R1k0K0ZIUFJ5bm9pTnZ6Z2RM?= =?utf-8?B?elZGYnlYenJ4UWdCaXNXb1U3SFp6dUljVENhMjBJTm5US1ROUHBHTVFPeHBG?= =?utf-8?B?ejZOc3hEZkppZFY5SmE5ckdZMEdJWXVMZkQ2ZlpPbjBaL0lMQXVjUkk5aVA4?= =?utf-8?B?bFUwcEpUTUtTeUJFOS9hYTNoZG1vYVRhYjFKdUluSDlicXpPMUVNNUdWZGx4?= =?utf-8?B?NWViSFRlZ2llZmkvbUovWHhza0hHSVFpY0pWR1dQLzJ5Q2szam9vNGFEQlRW?= =?utf-8?B?bWEvd2pjZ2dTSFlZTXV1dzBxSnhyY1NZSkZzZzFmdm5na0xmQm1Ka1kyMFJK?= =?utf-8?B?Uk1lYUllUmdySy9kWnpkR3FERGVid0JPc2FxN3lxQjFuZ0Z6cFlXb3U0SGFa?= =?utf-8?B?YXdNNHpOUkV4QzlLeWVYYnFRWXJKVEtWSVVXVzBrK0w1bDBsdUgwVEd3QmN4?= =?utf-8?B?QVUrNUxVcmIwUER3bHhEdGZ3L2xmZnp2T3RFWWtDcXVhQSt3a0RKMEo5WWFK?= =?utf-8?B?ZURRWXdCS09Jb041UE9QN2wxTjBCYTl4NDlMd1ZLdkU4OVB1SUtwRnN6QU1p?= =?utf-8?B?Y0ttRzcrZlpFWUE2OHY3ZXFFWG5neWdnYjlCQzZIZ2UyRmZWaEllZnFpSm1p?= =?utf-8?B?T0ZUWmtmSlJmak92MTZmRHlXbklKUXJoYWZXZ1A1VmR4Nzd1ZDlNbXdqOGhj?= =?utf-8?B?ZW5GR3MxbmdsQUF0YWtYVzI4VjhtVFJzSXpOYW5jQTNiZlNDdit2ZDdmMlRk?= =?utf-8?B?NzEyWVJzckdtVTRLQ2V6ZTd5Tkl5bTdYWkNUUnpkSjlXQWk4U1U4emZmUk9h?= =?utf-8?B?enVldURyMDVrRkM0Zzg0akU4TENJTUp3WHR6bHRqRnBqd3IyTTRPeWZGdk5h?= =?utf-8?B?WnJYZnNjdFdRQVlld0p2Yi9waTFFNnh2TzkydzlvRm9RZEJKRUhTT2xaamIw?= =?utf-8?B?K2RGM2c5YU1laXdrdmpVOWJ3QjJvZk5tL2F4SnIrSnNaWUFyZDE3UmdjZU1F?= =?utf-8?B?NUVwQWxLM1pWT0hnQ1daaDJPWklGQ0Z1YUk1dEIycEIramNPZFpWdTJmbjh2?= =?utf-8?B?b2txVDZIT2M0Ly9PTmM5eWxEWVV1dW1TdFZ2ak1tS0JaeGJGMjFxNGdTZkhR?= =?utf-8?B?a2lneld3QVowQU81SlV4RTJ6bDFqVXBYWXhUNWJVdThsYllSZXFHcDFpVWZz?= =?utf-8?B?UHJiQzJ5L25XcWlTYTdOb0NseFMzeGgvb2lydm9LZVE0SmhGZURkdlJ5c2s5?= =?utf-8?Q?pdBIDtv+ekW4YL3LVBVdGM4oLkr0nQ=3D?= X-Microsoft-Exchange-Diagnostics: 1;AM3PR06MB1412;5:VejJrZS1DmNd+5fnJeepA9zPSjfWFiatXlnVq0y7BglfVV1JoUIjgkeX8tPutCuc0btJZmj8p1ruMZOC+BWx/8dUudh9hBeXbKVe6aly+/UMHCFEQZbz+OaIn3/fUXulS2Zn/DKCZ5uqK/iYcHo5AA==;24:AXS/eUGkKNPfwT/HjIL+HRDQG4T5Gg84lEFK06qJiMZp1a5ATPrZ4XC4FOznx1gJcIEpHtDjblEc/q8hhR+QgpZVhzZs7IxewFWGDy5mmMs= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: cern.ch X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Nov 2015 20:14:50.7857 (UTC) X-MS-Exchange-CrossTenant-Id: c80d3499-4a40-4a8c-986e-abce017d6b19 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=c80d3499-4a40-4a8c-986e-abce017d6b19;Ip=[188.184.36.50];Helo=[CERNMX11.cern.ch] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM3PR06MB1412 Thanks Bill, > I will answer your questions directly, but it may > also make sense to have a higher-level discussion about your requirements > to possibly offer alternative approaches. Sure, that sounds very interesting. The main scheduling constraint in our system is that it cannot tolerate an undefined pending period, it needs at least some fast-feedback. Example scenario: the system asks to launch N instances of a given process with a specified set of resource constraints, now the scheduler should either - start them all successfully - start only a part of them + because of resource saturation for example - fail to start them + because those resources are not at all available in the farm, or they are effectively busy, or ... without queuing, but just returning the operation result. There are also other, more functional, requirements. The most interesting ones probably are: * Hooks. For good integration with the infrastructure, like publishing of information, access management control, and so on... * Fault tolerance and Error recovery. A failure in either the scheduler or executor shall not take down the managed processes and shall recover it after a restart for example. * Ownership. Possibility to set uid owner of the underlying processes. * Notification system. Provide informations about the jobs/processes status (both in pull or push mode) > 3. Would it possible to have Aurora handle tasks with no resource >> constraints? > [...] but the real question is: > what behavior do you want from scheduling without accounting? Yes, that is indeed the question. What would be the default scheduling behaviour in case of resource abundance? Would it be possible to impose a "spreading" constraint on a subset of the farm that a-priori I know is able to handle a defined set of jobs? Cheers, Riccardo On 11/24/2015 06:38 AM, Bill Farner wrote: > Welcome, happy to help! I will answer your questions directly, but it may > also make sense to have a higher-level discussion about your requirements > to possibly offer alternative approaches. > > 1. Would it possible to subscribe to state change for a given job/task and >> receive notifications? > > Not today, but i'm very open to the idea. I think there are some cool > things you could implement with this behavior. A concern i often have, > though, is that consumers of this data cannot handle cases where an event > fails to be delivered (i.e. they want a replica of the scheduler's state). > At any rate, i'd love to offer this behavior! > > 2. Would it possible to set a Pending time-out for tasks that take too long >> to be Assigned? > > Not currently. You could implement this by polling the API and killing > tasks that took too long to schedule. This would allow you to decide how > to react (if at all). > > 3. Would it possible to have Aurora handle tasks with no resource >> constraints? > > No. Both Aurora and Mesos require CPU and memory to be specified for > tasks. A client of Aurora could choose defaults, but the real question is: > what behavior do you want from scheduling without accounting? > > > On Mon, Nov 23, 2015 at 5:34 PM, Riccardo Poggi > wrote: > >> Hello, >> >> In order to introduce Aurora into our distributed system we would like to >> have it slowly, and hopefully transparently, replace what is currently the >> process manger component. To do that it would have to programmatically >> interface with other parts of the system that are, at the moment, taking >> care of what it could be considered the "active" orchestration. >> >> I've tried Aurora and looked at the docs, but I'm still left with some >> open questions: >> >> 1. Would it possible to subscribe to state change for a given job/task and >> receive notifications? >> >> 2. Would it possible to set a Pending time-out for tasks that take too >> long to be Assigned? >> >> 3. Would it possible to have Aurora handle tasks with no resource >> constraints? >> >> >> Thanks, >> Riccardo >>