accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michiel Vanderlee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4317) Accumulo client causes 'too many files open' due to infinite loop.
Date Sat, 28 May 2016 16:59:12 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305476#comment-15305476
] 

Michiel Vanderlee commented on ACCUMULO-4317:
---------------------------------------------

Just had this happen on my Accumulo Masters as well. 

Our HDFS cluster broke down and went into savemode, so after fixing it the accumulo masters
reconnected automatically but when I looked at the logs a few minutes later, I first saw a
ton of these:
{noformat}
2016-05-28 16:27:27,069 [rpc.ThriftUtil] WARN : Failed to open transport to arch05:9997
2016-05-28 16:27:27,069 [master.Master] ERROR: Error processing table state for store Metadata
Tablets
org.apache.thrift.transport.TTransportException: java.net.UnknownHostException
        at org.apache.accumulo.core.rpc.ThriftUtil.createClientTransport(ThriftUtil.java:326)
        at org.apache.accumulo.core.rpc.ThriftUtil.createTransport(ThriftUtil.java:190)
        at org.apache.accumulo.server.master.LiveTServerSet$TServerConnection.assignTablet(LiveTServerSet.java:91)
        at org.apache.accumulo.master.TabletGroupWatcher.flushChanges(TabletGroupWatcher.java:792)
        at org.apache.accumulo.master.TabletGroupWatcher.run(TabletGroupWatcher.java:295)
Caused by: java.net.UnknownHostException
        at sun.nio.ch.Net.translateException(Net.java:175)
        at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:139)
        at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:92)
        at org.apache.accumulo.core.rpc.TTimeoutTransport.create(TTimeoutTransport.java:72)
        at org.apache.accumulo.core.rpc.TTimeoutTransport.create(TTimeoutTransport.java:65)
        at org.apache.accumulo.core.rpc.ThriftUtil.createClientTransport(ThriftUtil.java:323)
        ... 4 more
{noformat}

Then eventually a ton of these:
{noformat}
2016-05-28 16:46:36,306 [rpc.ThriftUtil] WARN : Failed to open transport to arch05:9997
2016-05-28 16:46:36,306 [master.Master] ERROR: Error processing table state for store Root
Table
org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files
        at org.apache.accumulo.core.rpc.ThriftUtil.createClientTransport(ThriftUtil.java:326)
        at org.apache.accumulo.core.rpc.ThriftUtil.createTransport(ThriftUtil.java:190)
        at org.apache.accumulo.server.master.LiveTServerSet$TServerConnection.assignTablet(LiveTServerSet.java:91)
        at org.apache.accumulo.master.TabletGroupWatcher.flushChanges(TabletGroupWatcher.java:792)
        at org.apache.accumulo.master.TabletGroupWatcher.run(TabletGroupWatcher.java:295)
Caused by: java.net.SocketException: Too many open files
        at sun.nio.ch.Net.socket0(Native Method)
        at sun.nio.ch.Net.socket(Net.java:438)
        at sun.nio.ch.Net.socket(Net.java:431)
        at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:118)
        at sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:72)
        at org.apache.accumulo.core.rpc.TTimeoutTransport.create(TTimeoutTransport.java:69)
        at org.apache.accumulo.core.rpc.TTimeoutTransport.create(TTimeoutTransport.java:65)
        at org.apache.accumulo.core.rpc.ThriftUtil.createClientTransport(ThriftUtil.java:323)
        ... 4 more
{noformat}



I wonder if the issue is that the Socket doesn't get closed on exception.
{code:title=TTimeoutTransport.java|borderStyle=solid}
public static TTransport create(SocketAddress addr, long timeoutMillis) throws IOException
{
    Socket socket = SelectorProvider.provider().openSocketChannel().socket();
    socket.setSoLinger(false, 0);
    socket.setTcpNoDelay(true);
    socket.connect(addr);
    InputStream input = new BufferedInputStream(getInputStream(socket, timeoutMillis), 1024
* 10);
    OutputStream output = new BufferedOutputStream(NetUtils.getOutputStream(socket, timeoutMillis),
1024 * 10);
    return new TIOStreamTransport(input, output);
}
{code}

> Accumulo client causes 'too many files open' due to infinite loop.
> ------------------------------------------------------------------
>
>                 Key: ACCUMULO-4317
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4317
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 1.7.1
>            Reporter: Michiel Vanderlee
>            Priority: Minor
>
> Accumulo stores hostnames in zookeeper, if the client can not resolve these then it will
continue to try to connect in a while(true) loop. This will eventually cause 'too many files
open' errors.
> Loop is in ServerClient.java$executeRaw
> Bug: Should error out after some time, not retry infintely.
> Workaround: Add hostnames to /etc/hosts and restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message