accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: DNS Failures
Date Mon, 10 Nov 2014 00:53:02 GMT
This is a known deficiency that exists in the current API; the 
implementation tends to retry indefinitely and quickly.

This tends to work well when the services are functioning or failing 
"normally". If your DNS failure is transient, you should recover 
automatically, but, if it's an extended failure, you'll sit there like 
you're observing.

It's hard to draw the line between "expected" or recoverable failures 
and failures that you want to propagate back to your client. I'm not 
sure if this is something that's planning on being addressed in the new 
client API or not (https://issues.apache.org/jira/browse/ACCUMULO-2589).

Ariel Valentin wrote:
> We have a very peculiar situation, where a DNS failure is causing our
> application to hang.
>
> Based on the trace debugging logs it appears that the ThriftScanner
> encounters a TTransportException, which was caused by an
> UnknownHostException. It seems to then retry a few seconds later.
>
> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.accumulo/accumulo-core/1.6.0-cdh4.6.0/org/apache/accumulo/core/client/impl/ThriftScanner.java/#124
>
> https://gist.github.com/arielvalentin/794415d1744e52984d0d
>
> After tracing the code a bit I realized that we could mitigate the
> "hanging" by setting a timeout on our scans/writes however I would
> prefer that the client would fail faster if it could not resolve the
> hostnames of the TServers it found in zookeeper.
>
> Thoughts? Concerns? Opinions?
>
> Ariel Valentin
> e-mail: ariel@arielvalentin.com <mailto:ariel@arielvalentin.com>
> website: http://blog.arielvalentin.com
> skype: ariel.s.valentin
> twitter: arielvalentin
> linkedin: http://www.linkedin.com/profile/view?id=8996534
> ---------------------------------------
> *simplicity *communication
> *feedback *courage *respect

Mime
View raw message