brooklyn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Bouron <thomas.bou...@cloudsoftcorp.com>
Subject Re: Fixing BROOKLYN-534: service.state.expected value on error?
Date Wed, 27 Sep 2017 14:06:40 GMT
Hi Aled.

Option 1 is definitely simpler. While it let the author decide what to do,
it means that each policy has to have an ad-hoc behaviour based on what the
"expected state" value is, which might not reflect the real expected state.
As you said in you example, the expected state is "on-fire" which should
never be IMO. It should always reflect... well, the expected state so
either (starting|started|stopping|stopped).

This is why I'm leaning toward option 2. I appreciate that it can (will?)
be fiddly to implement this change across all entities/enrichers/policies
but I think it is worth the effort. Also, this means that we will have much
better information about what went wrong, the UI and CLI will benefit from
it (this obviously assume we do some work there to make those "problem
indicators" more visible)

Best.

On Mon, 18 Sep 2017 at 16:07 Aled Sage <aled.sage@gmail.com> wrote:

> Hi all,
>
> TL;DR: low-level discussion of error-handling, and entity state; should
> we change what we set the entity's "expected state" to when there are
> errors?
>
> ---
> I'm trying to fix https://issues.apache.org/jira/browse/BROOKLYN-534
> (when stopping, ensure the `ServiceRestarter` doesn't restart the
> entity, provisioning a new machine, in the situation where stop fails
> for some reason).
>
> See the failing test I've added in [1].
>
> The problem is that, when the `location.release(machine)` fails, it sets
> the entity's *expected state* to "on-fire".  This means we've lost all
> record of whether we wanted to be stopped or running. The
> ServiceRetarter therefore (sensibly?!) responds to the "on-fire" by
> restarting it!
>
> Background: software prcoess entities normally have an "expected state"
> saying what state they want to be in, and an "actual service state"
> saying what it really is. The latter is inferred from a combination of
> the expected state, whether the service.isUp, and also any explicit
> "problem indicators" that have been set.
>
> ---
> Options for fixing this include:
>
>  1. If the expected state is "on-fire", then don't do anything in such
>     policies - instead rely on whatever set that expected state to deal
>     with it (e.g. if something else called stop, which failed, then it
>     should deal with the error itself).
>  2. Don't set expected state to on-fire. Instead keep the expected state
>     as what we really want it to be, but include an appropriate "problem
>     indicator" (e.g. to say that "stop" failed).
>
> Option (1) would probably be easy, but it's unclear where you'd hook in
> logic to deal with such failures. It also feels wrong that we've lost
> all knowledge of what we were trying to do.
>
> Option (2) feels best. However, this stuff is fiddly. We'd need to make
> the sure the "problem indicator" is cleared correctly when an entity
> tries again to start or stop. If it's not cleared, then the entity will
> likely never report itself as recovered from that problem. For example,
> see all the calls to ServiceStateLogic.setExpectedState() [3].
>
> Is it enough to change [2] (to set a "problem indicator", and to set the
> expected state to "stopped"), and to clear that problem indicator in
> start [4] and restart [5]?
>
> Before I go down this road, does anyone have comments, questions, advice
> or strong opinions?
>
> Aled
>
> [1] https://github.com/apache/brooklyn-server/pull/829
> [2]
>
> https://github.com/apache/brooklyn-server/blob/master/software/base/src/main/java/org/apache/brooklyn/entity/software/base/lifecycle/MachineLifecycleEffectorTasks.java#L869
> (in the doStop method's catch block)
> [3]
>
> https://github.com/apache/brooklyn-server/blob/master/core/src/main/java/org/apache/brooklyn/core/entity/lifecycle/ServiceStateLogic.java#L158
> [4]
>
> https://github.com/apache/brooklyn-server/blob/master/software/base/src/main/java/org/apache/brooklyn/entity/software/base/lifecycle/MachineLifecycleEffectorTasks.java#L339
> [5]
>
> https://github.com/apache/brooklyn-server/blob/master/software/base/src/main/java/org/apache/brooklyn/entity/software/base/lifecycle/MachineLifecycleEffectorTasks.java#L661
>
> --

Thomas Bouron • Senior Software Engineer @ Cloudsoft Corporation •
https://cloudsoft.io/
Github: https://github.com/tbouron
Twitter: https://twitter.com/eltibouron

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message