hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17328) Properly dispose of looped replication peers
Date Mon, 19 Dec 2016 22:17:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762473#comment-15762473
] 

Andrew Purtell commented on HBASE-17328:
----------------------------------------

Mostly lgtm [~vincentpoon] but I wonder about this: 
{code}
 +  @Test(timeout = 300000)
+  public void testLoopedReplication() throws Exception {
+    LOG.info("testLoopedReplication");
+    startMiniClusters(1);
+    createTableOnClusters(table);
+    addPeer("1", 0, 0);
+    Thread.sleep(SLEEP_TIME); // wait for ReplicationSource to terminate
+
+    Table[] htables = getHTablesOnClusters(tableName);
...
{code}
Sleeping and assuming the state we want happened during the sleep makes the test brittle.
Could this be rewritten with a waiter and predicate? See https://hbase.apache.org/testdevapidocs/index.html?org/apache/hadoop/hbase/Waiter.html

> Properly dispose of looped replication peers
> --------------------------------------------
>
>                 Key: HBASE-17328
>                 URL: https://issues.apache.org/jira/browse/HBASE-17328
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 2.0.0, 1.4.0, 0.98.23
>            Reporter: Vincent Poon
>            Assignee: Vincent Poon
>            Priority: Critical
>             Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.9
>
>         Attachments: HBASE-17328-1.1.v1.patch, HBASE-17328-master.v1.patch, HBASE-17328-master.v2.patch,
HBASE-17328.branch-1.1.v2.patch, HBASE-17328.branch-1.1.v3.patch
>
>
> When adding a looped replication peer (clusterId == peerClusterId), the following code
terminates the replication source thread, but since the source manager still holds a reference,
WALs continue to get enqueued, and never get cleaned because they're stuck in the queue, leading
to an unsustainable buildup.  Furthermore, the replication statistics thread will continue
to print statistics for the terminated source.
> {code}
> if (clusterId.equals(peerClusterId) && !replicationEndpoint.canReplicateToSameCluster())
{
>       this.terminate("ClusterId " + clusterId + " is replicating to itself: peerClusterId
"
>           + peerClusterId + " which is not allowed by ReplicationEndpoint:"
>           + replicationEndpoint.getClass().getName(), null, false);
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message