accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4389) ReplicationOperations().drain(..) may return too quickly
Date Tue, 30 Aug 2016 20:25:20 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450053#comment-15450053
] 

Christopher Tubbs commented on ACCUMULO-4389:
---------------------------------------------

Interestingly, I just had a test failure of {{MultiInstanceReplicationIT.dataReplicatedToCorrectTable}}
where the drain did not finish at all, within the IT timeout limit of 20 minutes (timeout.factor=2).
The thread was stuck on line 431 in the 1.8 branch:
{code}
connMaster.replicationOperations().drain(masterTable1, filesFor1);
{code}

{code}
dataReplicatedToCorrectTable(org.apache.accumulo.test.replication.MultiInstanceReplicationIT)
 Time elapsed: 1,200.014 sec  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 1200 seconds
        at org.apache.accumulo.test.replication.MultiInstanceReplicationIT.dataReplicatedToCorrectTable(MultiInstanceReplicationIT.java:431)

dataReplicatedToCorrectTable(org.apache.accumulo.test.replication.MultiInstanceReplicationIT)
 Time elapsed: 1,200.014 sec  <<< ERROR!
java.lang.Exception: Appears to be stuck in thread Thread-69
{code}


> ReplicationOperations().drain(..) may return too quickly
> --------------------------------------------------------
>
>                 Key: ACCUMULO-4389
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4389
>             Project: Accumulo
>          Issue Type: Bug
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Critical
>             Fix For: 1.7.3, 1.8.1
>
>
> Was taking a look at some logs from automated tests that [~romil.choksi] sent my way
and noticed that MultiInstanceReplicationIT was failing infrequently.
> Looking at the output, I can see that the call was returning very quickly (essentially
in the amount of time the RPC would take on the slow test hardware)
> {noformat}
> Drain completed in 25ms
> {noformat}
> Looking at the implementation of {{MasterClientServiceHandler.drainReplicationTable(...)}},
it's not handling the references we read from the metadata table correctly. I believe this
is causing the test to return too quickly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message