geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Huynh <jhu...@pivotal.io>
Subject Re: [Proposal] Thread monitoring mechanism
Date Wed, 21 Feb 2018 18:53:50 GMT
I am assuming this would be for all thread/thread pools and not specific to
Function threads.  I wonder what the impact would be for put/get operations
or are we going to target specific operations.



On Tue, Feb 20, 2018 at 1:04 AM Gregory Vortman <Gregory.Vortman@amdocs.com>
wrote:

> Hello team,
> One of the most severe issues hitting our real time application is thread
> stuck for multiple reasons, such as long lasting locks, deadlocks, threads
> which wait for reply forever in case of packet drop issue etc...
> Such kind of stuck are under Radar of the existing system health check
> methods.
> In mission critical applications, this will be resulted as an immediate
> outage.
>
> As a short we are implementing kind of internal watch dog mechanism for
> stuck detector:
>                There is a registration object
>                Function executor having start/end hooks to
> register/unregister the thread via the registration object
> Customized Monitoring scheduled thread is spawned on startup. The thread
> to wake up every N seconds, to scan the registration map and to detect
> unregistered threads for a long time (configurable).
> Once such threads has been detected, process stack is taken and thread
> stack statistic metric is provided.
>
> This helps us to monitor, detect and take fast decision about the action
> which should be taken - usually it is member bounce decision (consistency
> issue is possible, in our case it is better than deny of service).
> The above solution is not touching GEODE core code, but implemented in
> boundaries of customized code only.
>
> I would like to raise a proposal to introduce a long term generic thread
> monitoring mechanism, to detect threads which are stuck for any reason.
> To maintain a monitoring object having a start/end methods to be invoked
> similarly to FunctionStats.startFunctionExecution and
> FunctionStats.endFunctionExecution.
>
> Your feedback would be appreciated
>
> Thank you for cooperation.
> Best regards!
>
> Gregory Vortman
>
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer <
> https://www.amdocs.com/about/email-disclaimer>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message