Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 80879200BFB for ; Wed, 11 Jan 2017 13:08:29 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 7F244160B2E; Wed, 11 Jan 2017 12:08:29 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A2AF2160B4E for ; Wed, 11 Jan 2017 13:08:28 +0100 (CET) Received: (qmail 85451 invoked by uid 500); 11 Jan 2017 12:08:27 -0000 Mailing-List: contact dev-help@brooklyn.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@brooklyn.apache.org Delivered-To: mailing list dev@brooklyn.apache.org Received: (qmail 84385 invoked by uid 99); 11 Jan 2017 12:08:26 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jan 2017 12:08:26 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 0DD1AC1BF3 for ; Wed, 11 Jan 2017 12:08:26 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.121 X-Spam-Level: X-Spam-Status: No, score=-0.121 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=cloudsoftcorp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id RYIaoOmHWwPJ for ; Wed, 11 Jan 2017 12:08:24 +0000 (UTC) Received: from mail-wj0-f182.google.com (mail-wj0-f182.google.com [209.85.210.182]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 2C3E65FDB6 for ; Wed, 11 Jan 2017 12:08:24 +0000 (UTC) Received: by mail-wj0-f182.google.com with SMTP id ew7so86914332wjc.3 for ; Wed, 11 Jan 2017 04:08:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudsoftcorp.com; s=google; h=from:content-transfer-encoding:mime-version:subject:message-id:date :to; bh=oNuXpvrG38mfv6SkCZtbu+M1JhZLJCTnbKx7E6EK6t0=; b=Vh95UUab61Gc8SS9bi1+PTZCLnA/04vJkMuwIOx8mG2WUOEFQ4RAoH/RtCujJme9KZ vcSZPzk0Pg2YmK4IjrzOyQnl32azhfIftJBUNjqdJ6/wc9ktg8z1iPSLcHjcayc50/Fz xN3aSLA3fxNU2QvN1znFWqUV5n/+xzUQX+6uA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:content-transfer-encoding:mime-version :subject:message-id:date:to; bh=oNuXpvrG38mfv6SkCZtbu+M1JhZLJCTnbKx7E6EK6t0=; b=uYSNx5OmnvO/Bxv9ilvvA/SaV2W+lciAV+ajtWk5gcTxBirytODi9ThW71PsDzRmrS RzOmvSS6SXmiHnAtzdI3lzxwpLMJjWifuuz4yatO19yQMCTgBrc85vYAWjEIFeXd+hxd I1KoDtMUpqAkVWRIu4EMPMcJoDsx+tq4YoDWIM65F84bV80LzFwDy7+UaeVq1ylpNjmA wBAQLSqo93V3cQz2g18TVR+tYxfmqW7tbUXuWOuTCze96MS+YBzSwsFp4/vlMCTyGN9m dU+MQWKRThN0oAgpbRnTS8mtiMrw6dCzydEnQ51LGTH4o0RRjpD6wZv62WWHzyvZyQKv VXcQ== X-Gm-Message-State: AIkVDXLS+cyewqZ1QsMNdnUFkslnMz2ICSx543fOaldk6gB4MiuOOcVlRSh/Nu5ozW4Yui5dXHZXRVvPV5du+5a29Qidx2ZgAhPlMns9izfwIRkBsjSFIx/EGBoqfgXhqlsHPQ== X-Received: by 10.194.222.132 with SMTP id qm4mr5806411wjc.150.1484136503357; Wed, 11 Jan 2017 04:08:23 -0800 (PST) Received: from [192.168.0.101] ([46.10.237.62]) by smtp.gmail.com with ESMTPSA id a13sm764384wma.0.2017.01.11.04.08.22 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 11 Jan 2017 04:08:22 -0800 (PST) From: Svetoslav Neykov Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: [PROPOSAL] Controlling effectors concurrency Message-Id: Date: Wed, 11 Jan 2017 14:08:21 +0200 To: dev@brooklyn.apache.org X-Mailer: Apple Mail (2.3259) X-Legal-Virus-Advice: Whilst all reasonable care has been taken to avoid the transmission of viruses, it is the responsibility of the recipient to ensure that the onward transmission, opening or use of this message and any attachments will not adversely affect its systems or data. No responsibility is accepted by Cloudsoft Corporation Limited in this regard and the recipient should carry out such virus and other checks as it considers appropriate. X-Legal-Confidentiality: This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer. Internet e-mails are not necessarily secure. Cloudsoft Corporation Limited does not accept responsibility for changes made to this message after it was sent. X-Legal-Company-Info: Cloudsoft Corporation Limited. Registered in Scotland. Number: SC349230. Registered Office: 13 Dryden Place, Edinburgh, EH9 1RP. archived-at: Wed, 11 Jan 2017 12:08:29 -0000 ## Problem The current model in Brooklyn for executing effectors is to do it in = parallel, without regard for already running instances of the same = effector. This makes writing certain classes of YAML blueprints harder - = use-cases which need to limit the number of concurrent executions. = Currently this gets worked around on per-blueprint basis, shifting the = burden of synchronizing/locking to the blueprint which have limited = means to do it. Some concrete examples: * A haproxy blueprint which needs to have at most one "update = configuration" effector running - solved in bash by using flock = https://github.com/brooklyncentral/clocker/blob/9d3487198f426e8ebc6efeee94= af3dc50383fa71/common/catalog/common/haproxy.bom * Some clusters have a limit on how many members can join at a time = (Cassandra notably) * A DNS blueprint needs to make sure that updates to the records = happen sequentially so no records get lost * To avoid API rate limits in certain services we need to limit how = many operations we do at any moment - say we want to limit provisioning = of entities, but not installing/launching them. A first step in solving the above has been made in = https://github.com/apache/brooklyn-server/pull/443 which adds = "maxConcurrentChildCommands" to the DynamicCluster operations (start, = resize, stop). This allows us to limit how many entities get = created/destroyed by the cluster in parallel. The goal of this proposal = is to extend it by making it possible to apply finer grained limits (say = just on the launch step of the start effector) and to make it more = general (not just start/stop in cluster but any effector). ## Proposed solution Add functionality which allows external code (e.g. adjuncts) to plug = into the lifecycle of entities **synchronously** and influence their = behaviour. This will allow us to influence the execution of effectors on = entities and for this particular proposal to block execution until some = condition is met. ## Possible approaches (alternatives) ### Effector execution notifications Provide the functionality to subscribe callbacks to be called when an = effector is about to execute on an entity. The callback has the ability = to mutate the effector, for example by adding a wrapper task to ensure = certain concurrency limits. A simpler alternative would be to add pre = and post execution callbacks. For this to be useful we need to split big = effectors into smaller pieces. For example the start effectors will be a = composition of provision, install, customize, launch effectors. The reason not to work at the task level is that tasks are anonymous so = we can't really subscribe to them. To do that we'd need to add = identifiers to them which essentially turns them into effectors. ### Add hooks to the existing effectors We could add fixed pre and post hooks to the start/stop effectors which = execute callbacks synchronously at key points around tasks. -- Both of the above will allow us to plug additional logic into the = lifecycle of entities, making it possible to block execution. For = clusters we'd plug into the members' lifecycle and provide cluster-wide = limits (say a semaphore shared by the members). For more complex = scenarios we could name the synchronising entity explicitly, for example = to block execution until a step in a separate entity is complete (say = registering DNS records after provisioning but before launch = application-wide). ## Examples Here are some concrete examples which give you a taste of what it would = look like (thanks Geoff for sharing these) ### Limit the number of entities starting at any moment in the cluster = (but provision them in parallel) services: - type: cluster brooklyn.enrichers: ### plugs into the lifecycle provided callbacks and limits how many = tasks can execute in parallel after provisioning the machines ### by convention concurrency is counted down at the last stage if not = explicitly defined - type: org.apache.brooklyn.enricher.stock.LimitGroupTasksSemaphore brooklyn.config: stage: post.provisioning parallel.operation.size: auto # meaning the whole cluster; or = could be integer e.g. 10 for 10-at-a-time brooklyn.config: initialSize: 50 memberSpec: $brooklyn:entitySpec: type: cluste-member --- ### Use an third entity to control the concurrency brooklyn.catalog: items: - id: provisionBeforeInstallCluster version: 1.0.0 item: type: cluster id: cluster brooklyn.parameters: - name: initial.cluster.size description: Initial Cluster Size default: 50 brooklyn.config: initialSize: $brooklyn:config("initial.cluster.size") memberSpec: $brooklyn:entitySpec: type: cluster-member brooklyn.enrichers: - type: = org.apache.brooklyn.enricher.stock.AquirePermissionToProceed brooklyn.config: stage: post.provisioning ### Delegate the concurrency decisions to the referee entity authorisor: $brooklyn:entity("referee") brooklyn.children: - type: org.apache.brooklyn.entity.TaskRegulationSemaphore id: referee brooklyn.config: initial.value: = $brooklyn:entity("cluster").config("initial.cluster.size") # or 1 for = sequential execution --- Some thoughts from Alex form previous discussions on how it would look = like in YOML with initd-style effectors: I=E2=80=99d like to have a semaphore on normal nodes cluster and for the = =EF=BB=BF=E2=81=A0=E2=81=A0=E2=81=A0=E2=81=A0launch=EF=BB=BF=E2=81=A0=E2=81= =A0=E2=81=A0=E2=81=A0 step each node acquires that semaphore, releasing = when confirmed joined. i could see a task you set in yaml eg if using = the initdish idea 035-pre-launch-get-semaphore: { acquire-semaphore: { scope: = $brooklyn:parent(), name: "node-launch" } } 040-launch: { ssh: "service cassandra start" } 045-confirm-service-up: { wait: { sensor: service.inCluster, timeout: = 20m } } 050-finish-release-semaphore: semaphore-release tasks of type =EF=BB=BF=E2=81=A0=E2=81=A0=E2=81=A0=E2=81=A0acquire-semapho= re=EF=BB=BF=E2=81=A0=E2=81=A0=E2=81=A0=E2=81=A0 would use (create if = needed) a named semaphore against the given entity =E2=80=A6 but somehow = we need to say when it should automatically be released (eg on failure) = in addition to explicit release (the =EF=BB=BF=E2=81=A0=E2=81=A0=E2=81=A0=E2= =81=A0050=EF=BB=BF=E2=81=A0=E2=81=A0=E2=81=A0=E2=81=A0 which assumes = some scope, not sure how/if to implement that) --- Thanks to Geoff who shared his thoughts on the subject, with this post = based on them. Svet.