hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
Date Tue, 22 Jul 2014 16:48:40 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070490#comment-14070490
] 

Andrew Purtell commented on HBASE-9531:
---------------------------------------

The findbugs and test failure seem unrelated to this patch.

lgtm

ClusterStatus and ServerLoad etc should die eventually to be replaced with metrics usage but
we have someone who needs this today. 

Let's commit to 0.98+

Ping [~enis] for branch-1.

> a command line (hbase shell) interface to retreive the replication metrics and show replication
lag
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-9531
>                 URL: https://issues.apache.org/jira/browse/HBASE-9531
>             Project: HBase
>          Issue Type: New Feature
>          Components: Replication
>    Affects Versions: 0.99.0
>            Reporter: Demai Ni
>            Assignee: Demai Ni
>             Fix For: 0.99.0, 0.98.5
>
>         Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, HBASE-9531-trunk-v0.patch,
HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive the replication
metrics info such as:ageOfLastShippedOp, timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp,
and timeStampsOfLastAppliedOp. And also to provide a point of time info of the lag of replication(source
only)
> Understand that hbase is using Hadoop metrics(http://hbase.apache.org/metrics.html),
which is a common way to monitor metric info. This Jira is to serve as a light-weight client
interface, comparing to a completed(certainly better, but heavier)GUI monitoring package.
I made the code works on 0.94.9 now, and like to use this jira to get opinions about whether
the feature is valuable to other users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell command
'status', and invent a new module, called ReplicationLoad.  In HRegionServer.buildServerLoad()
, use the local replication service objects to get their loads  which could be wrapped in
a ReplicationLoad object and then simply pass it to the ServerLoad. In ReplicationSourceMetrics
and ReplicationSinkMetrics, a few getters and setters will be created, and ask Replication
to build a "ReplicationLoad".  (many thanks to Jean-Daniel for his kindly suggestions through
dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
> 	if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - timeStampsOfLastShippedOp))
//err on the large side
> 	else if (current time - timeStampsOfLastShippedOp) < 2* ageOfLastShippedOp then lag
= ageOfLastShippedOp // last shipped happen recently 
>         else lag = 0 // last shipped may happens last night, so NO real lag although
ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>         SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed
Sep 04 14:49:48 PDT 2013
>         SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 14:48:48
PDT 2013
>     hdtest018.svl.ibm.com:
>         SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed
Sep 04 14:48:48 PDT 2013
>         SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 14:50:59
PDT 2013
>     hdtest015.svl.ibm.com:
>         SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed
Sep 04 14:48:48 PDT 2013
>         SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 14:48:48
PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>         SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed
Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>         SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed
Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>         SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed
Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>         SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 14:48:48
PDT 2013
>     hdtest018.svl.ibm.com:
>         SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 14:50:59
PDT 2013
>     hdtest015.svl.ibm.com:
>         SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 14:48:48
PDT 2013
> hbase(main):003:0> status 'replication','lag' 
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com: lag = 0
>     hdtest018.svl.ibm.com: lag = 14
>     hdtest015.svl.ibm.com: lag = 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message