brooklyn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aled Sage (JIRA)" <>
Subject [jira] [Created] (BROOKLYN-16) Quarantine group: improve functionality and usability
Date Tue, 01 Jul 2014 15:33:24 GMT
Aled Sage created BROOKLYN-16:

             Summary: Quarantine group: improve functionality and usability
                 Key: BROOKLYN-16
             Project: Brooklyn
          Issue Type: Improvement
            Reporter: Aled Sage

I'd like us to clean up the behaviour and appearance of the "quarantine group" of clusters.
My recent experience with some enterprise users highlights that it's confusing!

The configuraiton "dynamiccluster.quarantineFailedEntities" controls whether failed members
of the cluster should be quarantined, or just deleted straight away.

Once an entity goes into quarantine, there is currently no way to get it out again (except
deleting or discarding the entity).

However, it is good we don't add unquarantine nodes automatically (e.g. on the entity going
to service-up again) because it may have been quarantined for good reason, such as going up+down.

PROPOSAL 1: We should have an explicit effector on the quarantine group entity to move the
member back into the cluster's group of healthy members.

PROPOSAL 2: We should add a dynamic effector to each member of the quarantined group for "restoreFromQuarantine",
which would add the member back into the cluster's group.
A user could invoke this effector by selecting the member in the web-console.

PROPOSAL 3: We could add an effector "restartMembers(boolean parallel)" on the quarantine
group. Invoking this would restart the process for each member of the quarantine group. If
parallel==true then this would be done in parallel, otherwise one member at a time.

PROPOSAL 4: We should have an explicit effector on the cluster to quarantine a member.

There is an expungeMembers effector on the quarantine group. This takes a single parameter
of "boolean firstStop", which controls whether it calls entity.stop() before unmanaging each

The parameter name is confusing. Also the two behaviour is very different for the two parameter
values, so potentially deserves two separate effectors.

Note this feels related to the "expunge" operation under the "lifecycle" tab of the web-console.
There, it brings up a modal dialog with "Unmange an entity and (optionally) clean up resources,
such as releasing a VM" and a checkbox for "Release resources".
The user feedback there was that it isn't the behaviour they expected when clicking "expunge".
And that the behaviour was so different with the box ticked or unticked that it deserved two
different operations.

PROPOSAL 5: replace the existing effector with two effectors: `unmanageMembers()` would just
unmanage the entities without stopping or freeing the resources; `stopAndUnmanageMembers()`
would first release the resources of each member (e.g. VMs etc, by calling entity.stop) and
would then unmanage each.

Quarantine alternative
In our use-case, we're using docker. What we really want for this kind of failed node is to...
generate a dump of the running process, and then stop the container (thus preserving the disk).
We want the entity to be discarded from the cluster.

PROPOSAL 6: Add another config option to DynamicCluster for failedEntityHandler. This would
take an instance of something like:

    public interface FailedEntityHandler {
        public enum HandlerResponse {
        HandlerResponse onFailedEntity(DynamicCluster cluster, Entity failedMember);

Currently... if quarantined, then the entity tree (in the web-console) shows a "quarantine
group" underneath (i.e. as a child of) the cluster.

All entities in the cluster (be they members of the quarantine group or healthy members of
the cluster) appear under the cluster itself. This is because their *parent* is the cluster.
An entity's parent never changes. What the user is really interested here is seeing the group

There's a separate conversation to be had (or resurrected) about visualising groups (and other
relationships) in the web-console. This use-case should be considered there.

This message was sent by Atlassian JIRA

View raw message