hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashu Pachauri (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-16302) age of last shipped op and age of last applied op should be a histogram
Date Mon, 21 Nov 2016 22:50:58 GMT

     [ https://issues.apache.org/jira/browse/HBASE-16302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ashu Pachauri updated HBASE-16302:
----------------------------------
    Description: 
Replication exports metric ageOfLastShippedOp as an indication of how much replication is
lagging. But, with multiwal enabled, it's not representative because replication could be
lagging for a long time for one wal group (something wrong with a particular region) while
being fine for others. The ageOfLastShippedOp becomes a useless metric for alerting in such
a case.

Also, since there is no mapping between individual replication sources and replication sinks,
the age of last applied op can be a highly spiky metric if only certain replication sources
are lagging.

We should use histograms for these metrics and use maximum value of this histogram to report
replication lag when building stats.

  was:
Replication exports metric ageOfLastShippedOp as an indication of how much replication is
lagging. But, with multiwal enabled, it's not representative because replication could be
lagging for a long time for one wal group (something wrong with a particular region) while
being fine for others. The ageOfLastShippedOp becomes a useless metric for alerting in such
a case.

We should just report the maximum of the age of last shipped ops across walgroups.


> age of last shipped op and age of last applied op should be a histogram
> -----------------------------------------------------------------------
>
>                 Key: HBASE-16302
>                 URL: https://issues.apache.org/jira/browse/HBASE-16302
>             Project: HBase
>          Issue Type: Improvement
>          Components: Replication
>            Reporter: Ashu Pachauri
>            Assignee: Ashu Pachauri
>
> Replication exports metric ageOfLastShippedOp as an indication of how much replication
is lagging. But, with multiwal enabled, it's not representative because replication could
be lagging for a long time for one wal group (something wrong with a particular region) while
being fine for others. The ageOfLastShippedOp becomes a useless metric for alerting in such
a case.
> Also, since there is no mapping between individual replication sources and replication
sinks, the age of last applied op can be a highly spiky metric if only certain replication
sources are lagging.
> We should use histograms for these metrics and use maximum value of this histogram to
report replication lag when building stats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message