brooklyn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geoff Macartney <geoff.macart...@cloudsoftcorp.com>
Subject Re: YAML Blueprints Failing to Stop
Date Fri, 04 Dec 2015 10:46:16 GMT
hi Mike

my 0.2¢ worth - in general terms it is going to be difficult to analyse these sorts of dependencies,
either for Brooklyn programmatically, or for blueprint authors when constructing their blueprints.
 The potential complexity of an arbitrary network of dependencies is unlimited.   It feels
like having this sort of shut-down-in-this-order instruction in the blueprints could be brittle
and error-prone.  You can imagine that it would work OK in simple cases such as this one,
but when it went wrong in complex cases it would be a real pig to debug!

Perhaps the fundamental problem is that the attributeWhenReady is waiting when it doesn’t
make sense to do so?  I see in the code that this method already avoids waiting if the depended-on
entity is on fire [1]:


       /**
         * Will wait for the attribute on the given entity.
         * If that entity reports {@link Lifecycle#ON_FIRE} for its {@link Attributes#SERVICE_STATE}
then it will abort. 
         */
        public <T2> Builder<T2,T2> attributeWhenReady(Entity source, AttributeSensor<T2>
sensor) {
            return new Builder<T2,T2>(source, sensor).abortIfOnFire();
        }

Should this abortIfOnFire() not really be a call like abortIfNotWorthIt(), which would avoid
the wait if the state was any of ON_FIRE, STOPPING, STOPPED, or DESTROYED?  Would it ever
make sense to wait if the entity were in one of these states?

If that change isn’t valid for some reason, at the very least I see a bit further down the
code [2]

        /** specifies an optional timeout; by default it waits forever, or until unmanaged
or other abort condition */
        public Builder<T,V> timeout(Duration val) {

is “forever” a sensible default?  My gut feeling is that “forever” should be something
that a user should have to choose explicitly, precisely because of this sort of case.   Maybe
we could have some other default, either a fixed value, or one with a small degree of smartness,
where the default again could be context-dependent on the state (e.g. only wait for 1 minute
if the entity is shutting down etc.)

Just a thought, cheers,
Geoff


[1] https://github.com/apache/incubator-brooklyn/blob/master/core/src/main/java/org/apache/brooklyn/core/sensor/DependentConfiguration.java#L641
[2] https://github.com/apache/incubator-brooklyn/blob/master/core/src/main/java/org/apache/brooklyn/core/sensor/DependentConfiguration.java#L729


————————————————————
Gnu PGP key - http://is.gd/TTTTuI


> On 4 Dec 2015, at 05:43, Mike Zaccardo <mike.zaccardo@cloudsoftcorp.com> wrote:
> 
> I went ahead and partially implemented the second, simpler solution[1].
> With this fix, the blueprint I posted above which demonstrates this problem
> will now stop when proper `stop.sequence.position` values are added to the
> YAML file.
> 
> Any thoughts (posted here or in the PR)?
> 
> -Mike
> 
> [1] https://github.com/apache/incubator-brooklyn/pull/1088
> 
> On Thu, Dec 3, 2015 at 3:02 PM Mike Zaccardo <
> mike.zaccardo@cloudsoftcorp.com> wrote:
> 
>> Problem reduced to simpler and more generic terms:
>> 
>> -Stop is called on a BasicApplication "A"
>> -Stop is then called on A's children, X and Y, in parallel[1]
>> -Y has a `shell.env` value which is an `attributeWhenReady` of a value
>> from X
>> -Y hangs infinitely trying to resolve the value from X which has already
>> stopped, so Y never stops
>> 
>> Potential solutions:
>> 
>> -More complicated to implement but more robust: Brooklyn analyzes children
>> entities and detects these types of dependencies, and then schedules the
>> shutdown process accordingly, resulting in safe sequential shut down vs
>> currently broken parallel shutdown.
>> 
>> -Simpler to implement but puts the onus on the blueprint author: add a
>> value to the DSL which represents a numerical dependency hierarchy. In this
>> case, X would have a value of 1 and Y would have a value of 2 (assigned in
>> the YAML blueprint). Upon stop, the entities will stop sequentially in
>> decreasing numerical order (2->Y and then 1->X).
>> 
>> WDYT?
>> 
>> -Mike
>> 
>> [1]
>> https://github.com/apache/incubator-brooklyn/blob/master/core/src/main/java/org/apache/brooklyn/core/entity/trait/StartableMethods.java#L59
>> 
>> On Wed, Dec 2, 2015 at 8:52 PM Mike Zaccardo <
>> mike.zaccardo@cloudsoftcorp.com> wrote:
>> 
>>> Hi all,
>>> 
>>> When attempting to stop some of my more complicated YAML blueprints I
>>> have encountered situations where certain entities will hang on the stop
>>> phase, preventing the application from disappearing.
>>> 
>>> Below is a dummy example which demonstrates the problem.  Note that this
>>> makes use of the `all.members.up` sensor which is not in master, so my
>>> branch[1] must be used if you'd like to run it yourself.
>>> 
>>> BOM file: https://gist.github.com/mikezaccardo/39cc5065c10efde765d5
>>> YAML file: https://gist.github.com/mikezaccardo/06dc8ba981694c983399
>>> 
>>> When the "Cluster Stop Test" entity is expunged, the "Core Cluster"
>>> entity will stop correctly, but the "Secondary Cluster Entity" will hang.
>>> This occurs because each of the secondary cluster members will fail to
>>> resolve a dependent value, specifically stuck on `Retrieving
>>> $brooklyn:entity("my-core-cluster").attributeWhenReady("host.address.list.comma_separated")`
>>> because the core cluster has already shut down.
>>> 
>>> However, if I debug and add a breakpoint to `DynamicClusterImpl` stop
>>> method, and manually ensure that the "Secondary Cluster" stops first before
>>> "Core Cluster", then the expunge will proceed fully without error.
>>> 
>>> The reason why I am not fully convinced that this is a Brooklyn bug is
>>> because I reference the "my-core-cluster" ID in the BOM file, yet this is
>>> an ID that is specified in the YAML file, which to me is a bad smell.  Is
>>> that, in fact, bad practice from a blueprinting perspective?  If so then I
>>> do not know how I'd properly reference that cluster by ID in the BOM.
>>> 
>>> WDYT?  I'd be glad to 1:1 with someone to think through this more.
>>> 
>>> Cheers,
>>> Mike
>>> 
>>> [1]
>>> https://github.com/mikezaccardo/incubator-brooklyn/tree/all-members-up-sensor
>>> 
>>> 
>>> 


Mime
View raw message