brooklyn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Zaccardo <mike.zacca...@cloudsoftcorp.com>
Subject Re: YAML Blueprints Failing to Stop
Date Fri, 04 Dec 2015 20:03:17 GMT
Thank you both for your input!  I agree with both general points; Brooklyn
keeping track of all of the aWR is an intense undertaking, and manually
specifying the shutdown order *can* work but it definitely is error prone
and likely not something we want to ask the author to deal with.

I will investigate the aWR code and look to abort on shutdown, good idea.
There's only one concern I have with that: `stop` scripts would not be able
to reference any of the `shell.env` values that make problematic aWR
calls.  None of my blueprints require that, but it is definitely an effect
of this change.  Stopping in order does not present this issue (but obv has
a host of others).

-Mike

On Fri, Dec 4, 2015 at 2:46 AM Geoff Macartney <
geoff.macartney@cloudsoftcorp.com> wrote:

> hi Mike
>
> my 0.2¢ worth - in general terms it is going to be difficult to analyse
> these sorts of dependencies, either for Brooklyn programmatically, or for
> blueprint authors when constructing their blueprints.  The potential
> complexity of an arbitrary network of dependencies is unlimited.   It feels
> like having this sort of shut-down-in-this-order instruction in the
> blueprints could be brittle and error-prone.  You can imagine that it would
> work OK in simple cases such as this one, but when it went wrong in complex
> cases it would be a real pig to debug!
>
> Perhaps the fundamental problem is that the attributeWhenReady is waiting
> when it doesn’t make sense to do so?  I see in the code that this method
> already avoids waiting if the depended-on entity is on fire [1]:
>
>
>        /**
>          * Will wait for the attribute on the given entity.
>          * If that entity reports {@link Lifecycle#ON_FIRE} for its {@link
> Attributes#SERVICE_STATE} then it will abort.
>          */
>         public <T2> Builder<T2,T2> attributeWhenReady(Entity source,
> AttributeSensor<T2> sensor) {
>             return new Builder<T2,T2>(source, sensor).abortIfOnFire();
>         }
>
> Should this abortIfOnFire() not really be a call like abortIfNotWorthIt(),
> which would avoid the wait if the state was any of ON_FIRE, STOPPING,
> STOPPED, or DESTROYED?  Would it ever make sense to wait if the entity were
> in one of these states?
>
> If that change isn’t valid for some reason, at the very least I see a bit
> further down the code [2]
>
>         /** specifies an optional timeout; by default it waits forever, or
> until unmanaged or other abort condition */
>         public Builder<T,V> timeout(Duration val) {
>
> is “forever” a sensible default?  My gut feeling is that “forever” should
> be something that a user should have to choose explicitly, precisely
> because of this sort of case.   Maybe we could have some other default,
> either a fixed value, or one with a small degree of smartness, where the
> default again could be context-dependent on the state (e.g. only wait for 1
> minute if the entity is shutting down etc.)
>
> Just a thought, cheers,
> Geoff
>
>
> [1]
> https://github.com/apache/incubator-brooklyn/blob/master/core/src/main/java/org/apache/brooklyn/core/sensor/DependentConfiguration.java#L641
> [2]
> https://github.com/apache/incubator-brooklyn/blob/master/core/src/main/java/org/apache/brooklyn/core/sensor/DependentConfiguration.java#L729
>
>
> ————————————————————
> Gnu PGP key - http://is.gd/TTTTuI
>
>
> > On 4 Dec 2015, at 05:43, Mike Zaccardo <mike.zaccardo@cloudsoftcorp.com>
> wrote:
> >
> > I went ahead and partially implemented the second, simpler solution[1].
> > With this fix, the blueprint I posted above which demonstrates this
> problem
> > will now stop when proper `stop.sequence.position` values are added to
> the
> > YAML file.
> >
> > Any thoughts (posted here or in the PR)?
> >
> > -Mike
> >
> > [1] https://github.com/apache/incubator-brooklyn/pull/1088
> >
> > On Thu, Dec 3, 2015 at 3:02 PM Mike Zaccardo <
> > mike.zaccardo@cloudsoftcorp.com> wrote:
> >
> >> Problem reduced to simpler and more generic terms:
> >>
> >> -Stop is called on a BasicApplication "A"
> >> -Stop is then called on A's children, X and Y, in parallel[1]
> >> -Y has a `shell.env` value which is an `attributeWhenReady` of a value
> >> from X
> >> -Y hangs infinitely trying to resolve the value from X which has already
> >> stopped, so Y never stops
> >>
> >> Potential solutions:
> >>
> >> -More complicated to implement but more robust: Brooklyn analyzes
> children
> >> entities and detects these types of dependencies, and then schedules the
> >> shutdown process accordingly, resulting in safe sequential shut down vs
> >> currently broken parallel shutdown.
> >>
> >> -Simpler to implement but puts the onus on the blueprint author: add a
> >> value to the DSL which represents a numerical dependency hierarchy. In
> this
> >> case, X would have a value of 1 and Y would have a value of 2 (assigned
> in
> >> the YAML blueprint). Upon stop, the entities will stop sequentially in
> >> decreasing numerical order (2->Y and then 1->X).
> >>
> >> WDYT?
> >>
> >> -Mike
> >>
> >> [1]
> >>
> https://github.com/apache/incubator-brooklyn/blob/master/core/src/main/java/org/apache/brooklyn/core/entity/trait/StartableMethods.java#L59
> >>
> >> On Wed, Dec 2, 2015 at 8:52 PM Mike Zaccardo <
> >> mike.zaccardo@cloudsoftcorp.com> wrote:
> >>
> >>> Hi all,
> >>>
> >>> When attempting to stop some of my more complicated YAML blueprints I
> >>> have encountered situations where certain entities will hang on the
> stop
> >>> phase, preventing the application from disappearing.
> >>>
> >>> Below is a dummy example which demonstrates the problem.  Note that
> this
> >>> makes use of the `all.members.up` sensor which is not in master, so my
> >>> branch[1] must be used if you'd like to run it yourself.
> >>>
> >>> BOM file: https://gist.github.com/mikezaccardo/39cc5065c10efde765d5
> >>> YAML file: https://gist.github.com/mikezaccardo/06dc8ba981694c983399
> >>>
> >>> When the "Cluster Stop Test" entity is expunged, the "Core Cluster"
> >>> entity will stop correctly, but the "Secondary Cluster Entity" will
> hang.
> >>> This occurs because each of the secondary cluster members will fail to
> >>> resolve a dependent value, specifically stuck on `Retrieving
> >>>
> $brooklyn:entity("my-core-cluster").attributeWhenReady("host.address.list.comma_separated")`
> >>> because the core cluster has already shut down.
> >>>
> >>> However, if I debug and add a breakpoint to `DynamicClusterImpl` stop
> >>> method, and manually ensure that the "Secondary Cluster" stops first
> before
> >>> "Core Cluster", then the expunge will proceed fully without error.
> >>>
> >>> The reason why I am not fully convinced that this is a Brooklyn bug is
> >>> because I reference the "my-core-cluster" ID in the BOM file, yet this
> is
> >>> an ID that is specified in the YAML file, which to me is a bad smell.
> Is
> >>> that, in fact, bad practice from a blueprinting perspective?  If so
> then I
> >>> do not know how I'd properly reference that cluster by ID in the BOM.
> >>>
> >>> WDYT?  I'd be glad to 1:1 with someone to think through this more.
> >>>
> >>> Cheers,
> >>> Mike
> >>>
> >>> [1]
> >>>
> https://github.com/mikezaccardo/incubator-brooklyn/tree/all-members-up-sensor
> >>>
> >>>
> >>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message