accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ACCUMULO-3036) 1.5 MiniCluster fails to start, forces clients to wait indefinitely
Date Thu, 13 Nov 2014 17:34:34 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14210060#comment-14210060
] 

Josh Elser edited comment on ACCUMULO-3036 at 11/13/14 5:33 PM:
----------------------------------------------------------------

One easy fix would be to watch ZooKeeper and wait for the locks for the started processes
to be acquired. If they fail to do so after some period of time, we can abort.

If we return on {{start()}} before the locks are actually held, the client is just going to
be sitting there spinning its wheels trying to connect anyways. This would also be generally
applicable to all versions, not just 1.5


was (Author: elserj):
One easy fix would be to watch ZooKeeper and wait for the locks for the started processes
to be acquired. If they fail to do so after some period of time, we can abort.

If we return on {{start()}} before the locks are actually held, the client is just going to
be sitting there spinning its wheels trying to connect anyways.

> 1.5 MiniCluster fails to start, forces clients to wait indefinitely
> -------------------------------------------------------------------
>
>                 Key: ACCUMULO-3036
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3036
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: mini
>    Affects Versions: 1.5.0, 1.5.1
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 1.5.3
>
>
> Over in Pig land, a user was complaining about a test which used MiniAccumuloCluster
that hung until the JUnit timeout was hit.
> Eventually, the problem was diagnosed as a bad classpath (old version of Thrift was included
and used), which was causing the TServer and Master to immediately bail out. However, the
client sat indefinitely trying to connect unsuccessfully.
> MAC#start should not return before we're sure that the processes are actually up and
running (a very quick smoke test).
> It looks like ACCUMULO-1537 introduced a call to SetGoalState on the Master before MAC#start
returned which would (I assume) fail and then throw a RTE if the Master decided to die. Including
this fix in 1.5 may be sufficient to fix the underlying issue the user was seeing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message