Mailing-List: contact dev-help@brooklyn.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@brooklyn.apache.org
From: Svetoslav Neykov <svetoslav.neykov@cloudsoftcorp.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\))
Subject: [PROPOSAL] Controlling effectors concurrency
Message-Id: <BB1E4A82-EDB9-4E57-B5F5-D3F0125B56E2@cloudsoftcorp.com>
Date: Wed, 11 Jan 2017 14:08:21 +0200
To: dev@brooklyn.apache.org
archived-at: Wed, 11 Jan 2017 12:08:29 -0000

## Problem

The current model in Brooklyn for executing effectors is to do it in =
parallel, without regard for already running instances of the same =
effector. This makes writing certain classes of YAML blueprints harder - =
use-cases which need to limit the number of concurrent executions. =
Currently this gets worked around on per-blueprint basis, shifting the =
burden of synchronizing/locking to the blueprint which have limited =
means to do it.

Some concrete examples:
  * A haproxy blueprint which needs to have at most one "update =
configuration" effector running - solved in bash by using flock
    =
https://github.com/brooklyncentral/clocker/blob/9d3487198f426e8ebc6efeee94=
af3dc50383fa71/common/catalog/common/haproxy.bom
  * Some clusters have a limit on how many members can join at a time =
(Cassandra notably)
  * A DNS blueprint needs to make sure that updates to the records =
happen sequentially so no records get lost
  * To avoid API rate limits in certain services we need to limit how =
many operations we do at any moment - say we want to limit provisioning =
of entities, but not installing/launching them.

A first step in solving the above has been made in =
https://github.com/apache/brooklyn-server/pull/443 which adds =
"maxConcurrentChildCommands" to the DynamicCluster operations (start, =
resize, stop). This allows us to limit how many entities get =
created/destroyed by the cluster in parallel. The goal of this proposal =
is to extend it by making it possible to apply finer grained limits (say =
just on the launch step of the start effector) and to make it more =
general (not just start/stop in cluster but any effector).

## Proposed solution

Add functionality which allows external code (e.g. adjuncts) to plug =
into the lifecycle of entities **synchronously** and influence their =
behaviour. This will allow us to influence the execution of effectors on =
entities  and for this particular proposal to block execution until some =
condition is met.

## Possible approaches (alternatives)

### Effector execution notifications

Provide the functionality to subscribe callbacks to be called when an =
effector is about to execute on an entity. The callback has the ability =
to mutate the effector, for example by adding a wrapper task to ensure =
certain concurrency limits. A simpler alternative would be to add pre =
and post execution callbacks. For this to be useful we need to split big =
effectors into smaller pieces. For example the start effectors will be a =
composition of provision, install, customize, launch effectors.
The reason not to work at the task level is that tasks are anonymous so =
we can't really subscribe to them. To do that we'd need to add =
identifiers to them which essentially turns them into effectors.

### Add hooks to the existing effectors

We could add fixed pre and post hooks to the start/stop effectors which =
execute callbacks synchronously at key points around tasks.

--

Both of the above will allow us to plug additional logic into the =
lifecycle of entities, making it possible to block execution. For =
clusters we'd plug into the members' lifecycle and provide cluster-wide =
limits (say a semaphore shared by the members). For more complex =
scenarios we could name the synchronising entity explicitly, for example =
to block execution until a step in a separate entity is complete (say =
registering DNS records after provisioning but before launch =
application-wide).

## Examples

Here are some concrete examples which give you a taste of what it would =
look like (thanks Geoff for sharing these)


### Limit the number of entities starting at any moment in the cluster =
(but provision them in parallel)
services:
- type: cluster
  brooklyn.enrichers:
### plugs into the lifecycle provided callbacks and limits how many =
tasks can execute in parallel after provisioning the machines
### by convention concurrency is counted down at the last stage if not =
explicitly defined
  - type: org.apache.brooklyn.enricher.stock.LimitGroupTasksSemaphore
    brooklyn.config:
      stage: post.provisioning
      parallel.operation.size: auto # meaning the whole cluster; or =
could be integer e.g. 10 for 10-at-a-time
  brooklyn.config:
    initialSize: 50
    memberSpec:
      $brooklyn:entitySpec:
        type: cluste-member


---


### Use an third entity to control the concurrency
brooklyn.catalog:
  items:
  - id: provisionBeforeInstallCluster
    version: 1.0.0
    item:
      type: cluster
      id: cluster
      brooklyn.parameters:
      - name: initial.cluster.size
        description: Initial Cluster Size
        default: 50
      brooklyn.config:
        initialSize: $brooklyn:config("initial.cluster.size")
        memberSpec:
          $brooklyn:entitySpec:
            type: cluster-member
            brooklyn.enrichers:
            - type: =
org.apache.brooklyn.enricher.stock.AquirePermissionToProceed
              brooklyn.config:
                stage: post.provisioning
### Delegate the concurrency decisions to the referee entity
                authorisor: $brooklyn:entity("referee")
      brooklyn.children:
      - type: org.apache.brooklyn.entity.TaskRegulationSemaphore
        id: referee
        brooklyn.config:
          initial.value: =
$brooklyn:entity("cluster").config("initial.cluster.size") # or 1 for =
sequential execution


---

Some thoughts from Alex form previous discussions on how it would look =
like in YOML with initd-style effectors:

I=E2=80=99d like to have a semaphore on normal nodes cluster and for the =
=EF=BB=BF=E2=81=A0=E2=81=A0=E2=81=A0=E2=81=A0launch=EF=BB=BF=E2=81=A0=E2=81=
=A0=E2=81=A0=E2=81=A0 step each node acquires that semaphore, releasing =
when confirmed joined.  i could see a task you set in yaml eg if using =
the initdish idea

035-pre-launch-get-semaphore: { acquire-semaphore: { scope: =
$brooklyn:parent(), name: "node-launch" } }
040-launch: { ssh: "service cassandra start" }
045-confirm-service-up: { wait: { sensor: service.inCluster, timeout: =
20m } }
050-finish-release-semaphore: semaphore-release

tasks of type =EF=BB=BF=E2=81=A0=E2=81=A0=E2=81=A0=E2=81=A0acquire-semapho=
re=EF=BB=BF=E2=81=A0=E2=81=A0=E2=81=A0=E2=81=A0 would use (create if =
needed) a named semaphore against the given entity =E2=80=A6 but somehow =
we need to say when it should automatically be released (eg on failure) =
in addition to explicit release (the =EF=BB=BF=E2=81=A0=E2=81=A0=E2=81=A0=E2=
=81=A0050=EF=BB=BF=E2=81=A0=E2=81=A0=E2=81=A0=E2=81=A0 which assumes =
some scope, not sure how/if to implement that)

---

Thanks to Geoff who shared his thoughts on the subject, with this post =
based on them.

Svet.