hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
Date Thu, 24 Jan 2013 13:53:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13561626#comment-13561626
] 

Hadoop QA commented on HBASE-7634:
----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12565799/HBASE-7634.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 6 new or modified
tests.

    {color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4159//console

This message is automatically generated.
                
> Replication handling of changes to peer clusters is inefficient
> ---------------------------------------------------------------
>
>                 Key: HBASE-7634
>                 URL: https://issues.apache.org/jira/browse/HBASE-7634
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 0.96.0
>            Reporter: Gabriel Reid
>         Attachments: HBASE-7634.patch
>
>
> The current handling of changes to the region servers in a replication peer cluster is
currently quite inefficient. The list of region servers that are being replicated to is only
updated if there are a large number of issues encountered while replicating.
> This can cause it to take quite a while to recognize that a number of the regionserver
in a peer cluster are no longer available. A potentially bigger problem is that if a replication
peer cluster is started with a small number of regionservers, and then more region servers
are added after replication has started, the additional region servers will never be used
for replication (unless there are failures on the in-use regionservers).
> Part of the current issue is that the retry code in ReplicationSource#shipEdits checks
a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see
if it is up after a replication write has failed on a different randonly-chosen replication
peer. If the peer is seen as not down, another randomly-chosen peer is used for writing.
> A second part of the issue is that changes to the list of region servers in a peer cluster
are not detected at all, and are only picked up if a certain number of failures have occurred
when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message