hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Newman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9286) ageOfLastShippedOp replication metric doesn't update if the slave regionserver is stalled
Date Fri, 23 Aug 2013 02:17:51 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748222#comment-13748222
] 

Alex Newman commented on HBASE-9286:
------------------------------------

Just printing out what is sent to the server
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 2321 1377059826
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 4092 1377059836
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 1695 1377059846
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 7575 1377059856
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 17576 1377059866
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 27575 1377059876
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 37575 1377059886
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 2899 1377059896
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 2853 1377059906
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 4006 1377059916
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 429 1377059926

I suspeded the replication server at around 1377059856 and unsuspended around 1377059896
                
> ageOfLastShippedOp replication metric doesn't update if the slave regionserver is stalled
> -----------------------------------------------------------------------------------------
>
>                 Key: HBASE-9286
>                 URL: https://issues.apache.org/jira/browse/HBASE-9286
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Alex Newman
>            Assignee: Alex Newman
>         Attachments: 0001-HBASE-9286.-ageOfLastShippedOp-replication-metric-do.patch
>
>
> In replicationmanager
>      HRegionInterface rrs = getRS();
>         rrs.replicateLogEntries(Arrays.copyOf(this.entriesArray, currentNbEntries));
> ....
>         this.metrics.setAgeOfLastShippedOp(
>             this.entriesArray[currentNbEntries-1].getKey().getWriteTime());
>         break;
> which makes sense, but is wrong. The problem is that rrs.replicateLogEntries will block
for a very long time if the slave server is suspended or unavailable but not down.
> However this is easy to fix. We just need to call       refreshAgeOfLastShippedOp();
> on a regular basis, in a different thread. I've attached a patch which fixed this for
cdh4. I can make one for trunk and the like as well if you need me to do but it's a small
change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message