ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolay Izhikov <nizhi...@apache.org>
Subject Re: Partition map exchange metrics
Date Tue, 16 Jul 2019 12:01:16 GMT
I think administator of Ignite cluster should be able to monitor all Ignite process, including
non blocking PME.

В Вт, 16/07/2019 в 14:57 +0300, Anton Vinogradov пишет:
> BTW,
> Found PME metric - getCurrentPmeDuration().
> Seems, it shows exactly PME time and not so useful because of this.
> The goal it so show exactly blocking period.
> When PME cause no blocking, it's a good PME and I see no reason to have
> monitoring related to it :)
> 
> On Tue, Jul 16, 2019 at 2:50 PM Nikolay Izhikov <nizhikov@apache.org> wrote:
> 
> > Anton.
> > 
> > Why do we need to postpone implementation of this metrics?
> > For now, implementation of new metric is very simple.
> > 
> > I think we can implement this metrics as a single contribution.
> > 
> > В Вт, 16/07/2019 в 13:47 +0300, Anton Vinogradov пишет:
> > > Nikita,
> > > 
> > > Looks like all we need now is a 1 simple metric: are operations blocked?
> > > Just a true or false.
> > > Lest start from this.
> > > All other metrics can be extracted from logs now and can be implemented
> > > later.
> > > 
> > > On Tue, Jul 16, 2019 at 12:46 PM Nikolay Izhikov <nizhikov@apache.org>
> > > wrote:
> > > 
> > > > +1.
> > > > 
> > > > Nikita, please, go ahead.
> > > > 
> > > > 
> > > > вт, 16 июля 2019 г., 11:45 Nikita Amelchev <nsamelchev@gmail.com>:
> > > > 
> > > > > Hello, Igniters.
> > > > > 
> > > > > I suggest to add some useful metrics about the partition map exchange
> > > > > (PME). For now, the duration of PME stages available only in log
> > 
> > files
> > > > > and cannot be obtained using JMX or other external tools. [1]
> > > > > 
> > > > > I made the list of local node metrics that help to understand the
> > > > > actual status of current PME:
> > > > > 
> > > > > 1. initialVersion. Topology version that initiates the exchange.
> > > > > 2. initTime. Time PME was started.
> > > > > 3. initEvent. Event that triggered PME.
> > > > > 4. partitionReleaseTime. Time when a node has finished waiting for
> > 
> > all
> > > > > updates and translations on a previous topology.
> > > > > 5. sendSingleMessageTime. Time when a node sent a single message.
> > > > > 6. recieveFullMessageTime. Time when a node received a full message.
> > > > > 7. finishTime. Time PME was ended.
> > > > > 
> > > > > When new PME started all these metrics resets.
> > > > > 
> > > > > These metrics help to understand:
> > > > > - how long PME was (current or previous).
> > > > > - how long awaited for all updates was completed.
> > > > > - what node blocks PME (didn't send a single message)
> > > > > - what triggered PME.
> > > > > 
> > > > > Thoughts?
> > > > > 
> > > > > [1] https://issues.apache.org/jira/browse/IGNITE-11961
> > > > > 
> > > > > --
> > > > > Best wishes,
> > > > > Amelchev Nikita
> > > > > 

Mime
View raw message