sling-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SLING-8408) DistributionQueueHealthCheck should deal with failing queries
Date Thu, 09 May 2019 11:45:00 GMT

    [ https://issues.apache.org/jira/browse/SLING-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836298#comment-16836298
] 

Thomas Mueller commented on SLING-8408:
---------------------------------------

Patch for sling-org-apache-sling-distribution-cor:

{noformat}
diff --git a/src/main/java/org/apache/sling/distribution/monitor/DistributionQueueHealthCheck.java
b/src/main/java/org/apache/sling/distribution/monitor/DistributionQueueHealthCheck.java
index 38bf41e..caffc0d 100644
--- a/src/main/java/org/apache/sling/distribution/monitor/DistributionQueueHealthCheck.java
+++ b/src/main/java/org/apache/sling/distribution/monitor/DistributionQueueHealthCheck.java
@@ -124,8 +124,9 @@ public class DistributionQueueHealthCheck implements HealthCheck {
                         } else {
                             resultLog.debug("No items in queue [{}]", q.getName());
                         }
-
-                    } catch (Exception e) {
+                    } catch (IllegalStateException e) {
+                           resultLog.healthCheckError("The job index is not available (just
yet) while inspecting replication agent [{}]", queueName);
+                       } catch (Exception e) {
                         resultLog.warn("Exception while inspecting distribution queue [{}]:
{}", queueName, e);
                     }
                 }
{noformat}

* Catching IllegalStateException as that's what is thrown by SLING-8407 for the case where
no index is available.
* Report this as a health check error: it means the index is not available, which can happen
at the very first startup, or it could happen later on, if someone would remove the index.
In both cases, the system is not in a good state, so reporting an error is appropriate. I
would expect nobody monitors the health checks during the very first startup (where the repository
is initialized), but I argue during that time the system is in fact not available.


> DistributionQueueHealthCheck should deal with failing queries
> -------------------------------------------------------------
>
>                 Key: SLING-8408
>                 URL: https://issues.apache.org/jira/browse/SLING-8408
>             Project: Sling
>          Issue Type: Improvement
>          Components: Content Distribution
>            Reporter: Thomas Mueller
>            Priority: Major
>
> The following health check indirectly runs a queries which might fail:
>  * [DistributionQueueHealthCheck|https://github.com/apache/sling-org-apache-sling-distribution-core/blob/master/src/main/java/org/apache/sling/distribution/monitor/DistributionQueueHealthCheck.java]:
sling-org-apache-sling-distribution-core/src/main/java/org/apache/sling/distribution/monitor
> The call [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373],
which can throw an exception with SLING-8407, if the index is not yet available. The health
checks should catch this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message