ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy_Sorokin <sbt.sorokin....@gmail.com>
Subject Facility to detect long STW pauses and other system response degradations
Date Fri, 17 Nov 2017 13:08:21 GMT
Hi, Igniters!

This discussion thread related to
https://issues.apache.org/jira/browse/IGNITE-6171.

Currently there are no JVM performance monitoring tools in AI, for example
the impact of GC (eg STW) on the operation of the node. I think we should
add this functionality.

1) It is useful to know that STW duration increased or any other situations
leads to similar consequences.
This will allow system administrators to solve issues prior they become
problems.

I propose to add a special thread that will record current time every N
milliseconds and check the difference with the latest recorded value. 
The maximum and total pause values for a certain period can be published in
the special metrics available through JMX.

2) If the pause reaches a critical value, we need to stop the node, without
waiting for end of the pause.

The thread (from the first part of the proposed solution) is able to
estimate the pause duration, but only after its completion. 
So, we need an external thread (in another JVM or native) that is able to
recognize that the pause duration has passed the critical mark.

We can estimate (STW or similar) pause duration by
 a) reading value updated by the first thread, somehow (eg via JMX, shmem or
shared file)
 or
 b) by using JVM diagnostic tools. Does anybody know crossplatform
solutions?

Feel free to suggest ideas or tips, especially about second part of
proposal.

Thoughts?



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Mime
View raw message