accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ACCUMULO-131) ZookeeperInstance gets stuck when given bad host
Date Wed, 10 Oct 2012 00:54:02 GMT

     [ https://issues.apache.org/jira/browse/ACCUMULO-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Keith Turner updated ACCUMULO-131:
----------------------------------

    Affects Version/s: 1.4.1
                       1.4.0
               Status: Patch Available  (was: Open)

There is no way to distinguish being given a bogus host from a good host.  I define a bogus
host as a reachable machine:port where zookeeper will never run.  A good host is a machine:port
where zookeeper is running or will run in the future.  The code code already handles the case
where you give it a bad DNS name, this is clearly a bad host and it does not retry.

I have attached a patch that changes the behavior of ZooSession.  The patch throws an exception
if zookeeper can not be connected to within 2x the zookeeper timeout.  This patch significantly
changes the behavior of Accumulo.  Without this patch, if zookeeper went down a new Accumulo
client would just wait indefinitely till it came back up.  With this patch it will timeout.
What are peoples opinions about applying this to 1.4?
                
> ZookeeperInstance gets stuck when given bad host
> ------------------------------------------------
>
>                 Key: ACCUMULO-131
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-131
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 1.4.0, 1.4.1
>            Reporter: Keith Turner
>            Assignee: Eric Newton
>             Fix For: 1.4.2
>
>
> Keith Massey reported the following issue on the mailing list.
> {quote}
> A user of our recently filed a bug with us because our code hung forever when she gave
us an address for a zookeeper that was not running. I think I've traced the problem into org.apache.accumulo.core.zookeeper.ZooSession.connect().
If the connection to the zookeeper fails it throws a ConnectException, which gets caught by
the catch (IOException) block, which logs the message and keeps trying infinitely. It's definitely
user error passing in an invalid zookeeper. But shouldn't that method bail out after some
time?
> Thanks.
> Keith
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message