curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Sloane (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CURATOR-229) No retry on DNS lookup failure
Date Mon, 10 Apr 2017 23:05:42 GMT

    [ https://issues.apache.org/jira/browse/CURATOR-229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963625#comment-15963625
] 

Andy Sloane commented on CURATOR-229:
-------------------------------------

Right, there are two cases (permanent and temporary), but in both cases I would argue the
behavior is undesirable.

If the host is truly not resolvable, then you get the above background thread exception logged,
and... nothing else is obviously wrong. {{CuratorFrameworkImpl.start()}} returns without issue
while the background thread hangs, and there's no API-level indication, unless you've registered
an UnhandledErrorListenable, that anything is wrong, at least if you're using simple things
like {{LeaderLatch}}.

If it's a temporary DNS failure, and retrying would work, then retrying in the background
and not complaining in {{start()}} is fine, but if it's permanent you're stuck without really
bubbling the configuration error to the surface.

Even just not treating the error within {{CuratorFrameworkImpl.start()}} as a background exception
but instead just throwing it to the caller would improve the situation. And if it was previously
connected, and is reconnecting outside of {{start}} then attempting to reconnect to zk makes
sense.


> No retry on DNS lookup failure
> ------------------------------
>
>                 Key: CURATOR-229
>                 URL: https://issues.apache.org/jira/browse/CURATOR-229
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework
>    Affects Versions: 2.7.0
>            Reporter: Michael Putters
>
> Our environment is setup so that host names (rather than IP addresses) are used when
registering services.
> When disconnecting a node from the network, it will attempt to reconnect and - in order
to do this - attempts to resolve a host name, which fails (since we have no network connectivity
and a DNS server is used).
> It appears this type of exception is not retryable, and the node simply gives up and
never reconnects, even when the network connectivity is back.
> Is this the expected behavior? Is there any way to configure Curator so that this type
of exception is retryable? I had a look at {{CuratorFrameworkImpl.java}} around line 768 but
there doesn't seem to be anything configurable.
> If this is not the expected behavior (or if it is but you don't mind making it configurable),
I should be able to provide a patch via a pull request.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message