Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3687311066 for ; Mon, 12 May 2014 19:20:57 +0000 (UTC) Received: (qmail 43460 invoked by uid 500); 12 May 2014 17:34:16 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 43409 invoked by uid 500); 12 May 2014 17:34:16 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 43400 invoked by uid 99); 12 May 2014 17:34:16 -0000 Received: from Unknown (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 May 2014 17:34:16 +0000 Date: Mon, 12 May 2014 17:34:16 +0000 (UTC) From: "Lars Hofhansl (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-11143) ageOfLastShippedOp metric is confusing MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-11143: ---------------------------------- Attachment: (was: 11143-trunk.txt) > ageOfLastShippedOp metric is confusing > -------------------------------------- > > Key: HBASE-11143 > URL: https://issues.apache.org/jira/browse/HBASE-11143 > Project: HBase > Issue Type: Bug > Components: Replication > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3 > > Attachments: 11143-0.94-v2.txt, 11143-0.94.txt, 11143-trunk.txt > > > We are trying to report on replication lag and find that there is no good single metric to do that. > ageOfLastShippedOp is close, but unfortunately it is increased even when there is nothing to ship on a particular RegionServer. > I would like discuss a few options here: > Add a new metric: replicationQueueTime (or something) with the above meaning. I.e. if we have something to ship we set the age of that last shipped edit, if we fail we increment that last time (just like we do now). But if there is nothing to replicate we set it to current time (and hence that metric is reported to close to 0). > Alternatively we could change the meaning of ageOfLastShippedOp to mean to do that. That might lead to surprises, but the current behavior is clearly weird when there is nothing to replicate. > Comments? [~jdcryans], [~stack]. > If approach sounds good, I'll make a patch for all branches. -- This message was sent by Atlassian JIRA (v6.2#6252)