hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-11143) Improve replication metrics
Date Wed, 14 May 2014 00:55:17 GMT

     [ https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lars Hofhansl updated HBASE-11143:
----------------------------------

    Description: 
We are trying to report on replication lag and find that there is no good single metric to
do that.
ageOfLastShippedOp is close, but unfortunately it is increased even when there is nothing
to ship on a particular RegionServer.

I would like discuss a few options here:
Add a new metric: replicationQueueTime (or something) with the above meaning. I.e. if we have
something to ship we set the age of that last shipped edit, if we fail we increment that last
time (just like we do now). But if there is nothing to replicate we set it to current time
(and hence that metric is reported to close to 0).
Alternatively we could change the meaning of ageOfLastShippedOp to mean to do that. That might
lead to surprises, but the current behavior is clearly weird when there is nothing to replicate.

Comments? [~jdcryans], [~stack].

If approach sounds good, I'll make a patch for all branches.

Edit: Also adds a new 

  was:
We are trying to report on replication lag and find that there is no good single metric to
do that.
ageOfLastShippedOp is close, but unfortunately it is increased even when there is nothing
to ship on a particular RegionServer.

I would like discuss a few options here:
Add a new metric: replicationQueueTime (or something) with the above meaning. I.e. if we have
something to ship we set the age of that last shipped edit, if we fail we increment that last
time (just like we do now). But if there is nothing to replicate we set it to current time
(and hence that metric is reported to close to 0).
Alternatively we could change the meaning of ageOfLastShippedOp to mean to do that. That might
lead to surprises, but the current behavior is clearly weird when there is nothing to replicate.

Comments? [~jdcryans], [~stack].

If approach sounds good, I'll make a patch for all branches.


> Improve replication metrics
> ---------------------------
>
>                 Key: HBASE-11143
>                 URL: https://issues.apache.org/jira/browse/HBASE-11143
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.99.0, 0.94.20, 0.98.3
>
>         Attachments: 11143-0.94-v2.txt, 11143-0.94.txt, 11143-trunk.txt
>
>
> We are trying to report on replication lag and find that there is no good single metric
to do that.
> ageOfLastShippedOp is close, but unfortunately it is increased even when there is nothing
to ship on a particular RegionServer.
> I would like discuss a few options here:
> Add a new metric: replicationQueueTime (or something) with the above meaning. I.e. if
we have something to ship we set the age of that last shipped edit, if we fail we increment
that last time (just like we do now). But if there is nothing to replicate we set it to current
time (and hence that metric is reported to close to 0).
> Alternatively we could change the meaning of ageOfLastShippedOp to mean to do that. That
might lead to surprises, but the current behavior is clearly weird when there is nothing to
replicate.
> Comments? [~jdcryans], [~stack].
> If approach sounds good, I'll make a patch for all branches.
> Edit: Also adds a new 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message