sling-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Georg Henzler <slin...@ghenzler.de>
Subject Re: ResultRegistry for Health Checks -> StickyResults instead?
Date Tue, 06 Jun 2017 23:01:01 GMT
Hi,

> The goal is to declare health check results that remain valid for a
> specified time or forever.

So I agree metrics as proposed in comment [1] cannot achieve this 
(limited to 1, 5 and 15 minutes time windows). However I still think a 
purely declarative approach is cleaner and will lead to more consistency 
across HCs: We could introduce a HC property "hc.keepWarnStickyForMin" 
(and "hc.keepCriticalStickyForMin") - this can be entirely implemented 
in the impl package and would not require a new API. For the "Event 
queue overflown" example the property 
hc.keepWarnStickyForMin=Integer.MAX_VALUE could be set, the HC executor 
could then append a result as follows:

INFO Checking Event Queue...
INFO Event Queue is currently fine.
WARN --- Sticky result from 2017-06-07 11:49 ---
INFO Checking Event Queue...
WARN Event Queue overloaded!

This means the full log of both the current result and a historic sticky 
result would be shown (the timeout handling works similar already, if a 
HC times out the last available HC result is shown). The HC executor has 
all necessary meta data (the time is recorded in the execution result) 
and this would be easy to add. The best about this is that you can 
change the sticky time and the "stickiness" by configuration only - no 
redeployment needed :)

WDYT?

Best Regards
Georg

[1] 
https://issues.apache.org/jira/browse/SLING-6855?focusedCommentId=16010189&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16010189


> 
> For example, a quota has been tripped - warn for 30 minutes.
> 
> Or an events queue overflowed and the instance is considered damaged -
> raise a critical alarm forever.
> 
> With the current SLING-6855 one can raise such alarms but they are all
> grouped in a single health check - doing this results in that HC
> having both A and B tags and returning two results:
> 
>   ResultRegistry reg = sling.getService(ResultRegistry.class)
>   reg.put("testA", new Result(Result.Status.CRITICAL, "It's
> critical"), null, "A");
>   reg.put("testB", new Result(Result.Status.WARN, "B is just a
> warning"), null, "B");
> 
> So if you query for tag B you get both results, although they are 
> unrelated.
> 
> I would prefer creating one HC for each such alarm, and rename the
> service StickyResults instead of ResultRegistry.
> 
> So the above example (with service interface renamed) would cause two
> HCs to be created:
> 
> 1) StickyResult (testA) ; status CRITICAL, message "it's critical", tag 
> A
> 2) StickyResult (testB) ; status WARN, message "B is just a warning", 
> tag B
> 
> The HCs are keyed based on the "identifier" parameter, so in the above
> example putting another "testB" overwrites the existing one.
> 
> Clint and others, WDYT?
> 
> -Bertrand

Mime
View raw message