accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ACCUMULO-3269) nondeterministic failure of MiniAccumuloClusterStartStopTest
Date Mon, 08 Dec 2014 19:59:14 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238372#comment-14238372
] 

Josh Elser edited comment on ACCUMULO-3269 at 12/8/14 7:58 PM:
---------------------------------------------------------------

So, I don't have any understanding as to why this helps, but, when starting ZooKeeper and
we wait to be able to connect to it, we "spin" fast. We get an exception and we immediately
retry.

Adding a {{Thread.sleep(1000)}} in the catch block around sending "ruok" to the ZooKeeper
server appears to prevent this from happening. I went from being unable to run the unit tests
in the minicluster module for more than a few minutes in repetition to being able to run them
for 20mins...

I have no idea why this "helps".

For context: I had actually called out to {{netstat}} and used procfs to figure out what magical
process was already bound to the port that prevent ZK from coming up and noticed that suddenly
I stopped getting failures. I assumed that the latency from making those calls is what started
to make this work (as the output from those commands never showed me anything useful).


was (Author: elserj):
So, I don't have any understanding as to why this helps, but, when starting ZooKeeper and
we wait to be able to connect to it, we "spin" fast. We get an exception and we immediately
retry.

Adding a {{Thread.sleep(1000)}} in the catch block around sending "ruok" to the ZooKeeper
server appears to prevent this from happening. I went from being unable to run the unit tests
in the minicluster module for more than a few minutes in repetition to being able to run them
for 20mins...

I have no idea why this "helps".

> nondeterministic failure of MiniAccumuloClusterStartStopTest
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-3269
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3269
>             Project: Accumulo
>          Issue Type: Bug
>            Reporter: Adam Fuchs
>            Assignee: Josh Elser
>             Fix For: 1.7.0
>
>
> When building in master (mvn package -P assemble) I got the following error. Ran the
build again (also mvn package -P assemble, with no clean inbetween) and the whole build succeeded.
> {code}
> Running org.apache.accumulo.minicluster.MiniAccumuloClusterStartStopTest
> Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 31.103 sec <<<
FAILURE! - in org.apache.accumulo.minicluster.MiniAccumuloClusterStartStopTest
> multipleStopsIsAllowed(org.apache.accumulo.minicluster.MiniAccumuloClusterStartStopTest)
 Time elapsed: 20.016 sec  <<< ERROR!
> org.apache.accumulo.minicluster.impl.ZooKeeperBindException: Zookeeper did not start
within 20 seconds. Check the logs in /tmp/junit1360063600921880650/logs for errors.  Last
exception: java.net.ConnectException: Connection refused
> 	at org.apache.accumulo.minicluster.impl.MiniAccumuloClusterImpl.start(MiniAccumuloClusterImpl.java:548)
> 	at org.apache.accumulo.minicluster.MiniAccumuloCluster.start(MiniAccumuloCluster.java:72)
> 	at org.apache.accumulo.minicluster.MiniAccumuloClusterStartStopTest.multipleStopsIsAllowed(MiniAccumuloClusterStartStopTest.java:57)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message