ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikita Amelchev <nsamelc...@gmail.com>
Subject Re: Partition map exchange metrics
Date Fri, 19 Jul 2019 11:40:37 GMT
Anton, Nikolay,

Thanks for the support.

For now, we have the getCurrentPmeDuration() metric that does not show
influence on the cluster correctly. PME can be without blocking
operations. For example, client node join/leave events.

I suggest add new metric - isOperationsBlockedByPme(). Together, these
metrics will show influence of the PME on cluster and user operations.

I have prepared PR for this (Bot visa is green). [1] Can anyone take a look?

[1] https://issues.apache.org/jira/browse/IGNITE-11961

вт, 16 июл. 2019 г. в 14:58, Nikolay Izhikov <nizhikov@apache.org>:

>
> I think administator of Ignite cluster should be able to monitor all Ignite process,
including non blocking PME.
>
> В Вт, 16/07/2019 в 14:57 +0300, Anton Vinogradov пишет:
> > BTW,
> > Found PME metric - getCurrentPmeDuration().
> > Seems, it shows exactly PME time and not so useful because of this.
> > The goal it so show exactly blocking period.
> > When PME cause no blocking, it's a good PME and I see no reason to have
> > monitoring related to it :)
> >
> > On Tue, Jul 16, 2019 at 2:50 PM Nikolay Izhikov <nizhikov@apache.org> wrote:
> >
> > > Anton.
> > >
> > > Why do we need to postpone implementation of this metrics?
> > > For now, implementation of new metric is very simple.
> > >
> > > I think we can implement this metrics as a single contribution.
> > >
> > > В Вт, 16/07/2019 в 13:47 +0300, Anton Vinogradov пишет:
> > > > Nikita,
> > > >
> > > > Looks like all we need now is a 1 simple metric: are operations blocked?
> > > > Just a true or false.
> > > > Lest start from this.
> > > > All other metrics can be extracted from logs now and can be implemented
> > > > later.
> > > >
> > > > On Tue, Jul 16, 2019 at 12:46 PM Nikolay Izhikov <nizhikov@apache.org>
> > > > wrote:
> > > >
> > > > > +1.
> > > > >
> > > > > Nikita, please, go ahead.
> > > > >
> > > > >
> > > > > вт, 16 июля 2019 г., 11:45 Nikita Amelchev <nsamelchev@gmail.com>:
> > > > >
> > > > > > Hello, Igniters.
> > > > > >
> > > > > > I suggest to add some useful metrics about the partition map
exchange
> > > > > > (PME). For now, the duration of PME stages available only in
log
> > >
> > > files
> > > > > > and cannot be obtained using JMX or other external tools. [1]
> > > > > >
> > > > > > I made the list of local node metrics that help to understand
the
> > > > > > actual status of current PME:
> > > > > >
> > > > > > 1. initialVersion. Topology version that initiates the exchange.
> > > > > > 2. initTime. Time PME was started.
> > > > > > 3. initEvent. Event that triggered PME.
> > > > > > 4. partitionReleaseTime. Time when a node has finished waiting
for
> > >
> > > all
> > > > > > updates and translations on a previous topology.
> > > > > > 5. sendSingleMessageTime. Time when a node sent a single message.
> > > > > > 6. recieveFullMessageTime. Time when a node received a full
message.
> > > > > > 7. finishTime. Time PME was ended.
> > > > > >
> > > > > > When new PME started all these metrics resets.
> > > > > >
> > > > > > These metrics help to understand:
> > > > > > - how long PME was (current or previous).
> > > > > > - how long awaited for all updates was completed.
> > > > > > - what node blocks PME (didn't send a single message)
> > > > > > - what triggered PME.
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-11961
> > > > > >
> > > > > > --
> > > > > > Best wishes,
> > > > > > Amelchev Nikita
> > > > > >



--
Best wishes,
Amelchev Nikita

Mime
View raw message