mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xudong Ni (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MESOS-9178) Add a metric for master failover time.
Date Thu, 13 Sep 2018 00:25:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612890#comment-16612890
] 

Xudong Ni edited comment on MESOS-9178 at 9/13/18 12:24 AM:
------------------------------------------------------------

The difference in the PR and What Yan suggested is how do we calculate the comparison base.
The comparison base in the PR is the number of reregistration actually happened(as such p9999
is guaranteed, and the max is the last reregistration),  The comparison base in What Yan suggested
is the the number of reregistration actually happened + some reregistration didn't go through
such as unreachable; Since we already has metric covering unreachable already, I think it
may be better not baking that factor into this metrics? The percentage in the proposal not
only represent registration performance but it is also impacted by the number of unreachable
as well;

The description was updated to reflect the approach


was (Author: fiu):
The difference in the PR and What Yan suggested is how do we calculate the comparison base.
The comparison base in the PR is the number of reregistration actually happened(as such p9999
is guaranteed, and the max is the last reregistration),  The comparison base in What Yan suggested
is the the number of reregistration actually happened + some reregistration didn't go through
such as unreachable; Since we already has metric covering unreachable already, I think it
may be better not baking that factor into this metrics? The percentage in the proposal not
only represent registration performance but it is also impacted by the number of unreachable
as well;

> Add a metric for master failover time.
> --------------------------------------
>
>                 Key: MESOS-9178
>                 URL: https://issues.apache.org/jira/browse/MESOS-9178
>             Project: Mesos
>          Issue Type: Improvement
>          Components: master
>            Reporter: Xudong Ni
>            Assignee: Xudong Ni
>            Priority: Minor
>
> When an agent is reregistrated, the time delta from that moment to
> the master elected time was saved; In the progress of reregistration,
> each data entry represents the registration time delta from master
> elected time; The percentile of these data as in this metrics can
> represent overall reregistration progress; In case of degradation
> towards to the end of reregistration, the high percentile will
> reflect it.
> Note: These metrics only represent the completed reregistration; It
> does not monitor agents were finally marked as unreachable that the
> reregistration didn't actually happen, the unreachable agents were
> already monitored by existing metrics.
> https://reviews.apache.org/r/68706/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message