hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7325) Replication reacts slowly on a lightly-loaded cluster
Date Tue, 30 Jul 2013 21:01:49 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13724423#comment-13724423
] 

Jean-Daniel Cryans commented on HBASE-7325:
-------------------------------------------

bq. We'd be polling the NN with 1 req/s/RegionServer.

FWIW when normally replicating we poll the NN way more than 1/sec. It'd be nice if we didn't
even have to do it as often. If at least we could finish reading the current block without
having to go back it'd be a huge win.

bq. Are we really concerned about a 10s latency after the cluster was idle for a long time?
There are many other reason why replication can get behind - further behind than 10s.

I like it since you mitigate how long it can take to replicate an edit. Or for users who setup
replication and then try it out in the shell, hopefully they wouldn't have to wait 10 seconds
(I've seen this twice in the last two weeks).
                
> Replication reacts slowly on a lightly-loaded cluster
> -----------------------------------------------------
>
>                 Key: HBASE-7325
>                 URL: https://issues.apache.org/jira/browse/HBASE-7325
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>            Reporter: Gabriel Reid
>            Priority: Minor
>         Attachments: HBASE-7325.patch, HBASE-7325.v2.patch
>
>
> ReplicationSource uses a backing-off algorithm to sleep for an increasing duration when
an error is encountered in the replication run loop. However, this backing-off is also performed
when there is nothing found to replicate in the HLog.
> Assuming default settings (1 second base retry sleep time, and maximum multiplier of
10), this means that replication takes up to 10 seconds to occur when there is a break of
about 55 seconds without anything being written. As there is no error condition, and there
is apparently no substantial load on the regionserver in this situation, it would probably
make more sense to not back off in non-error situations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message