hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3376) DFSClient fails to make connection to DN if there are many unusable cached sockets
Date Tue, 08 May 2012 17:25:48 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270627#comment-13270627
] 

Todd Lipcon commented on HDFS-3376:
-----------------------------------

bq. Perhaps a naive question, but why can't socket.isClosed() be used to determine if the
socket is unusable? The closed sockets could be skipped and removed from the cache.

Unfortunately the .isClosed() method just checks a local flag which is set by close(). Here's
the JDK source:
{code}
    public boolean isClosed() {
        synchronized(closeLock) {
            return closed;
        }
    }
{code}

It may be possible to determine closed-ness by setting up a selector and selecting only for
errors, but that seems somewhat complicated and for not much gain.
                
> DFSClient fails to make connection to DN if there are many unusable cached sockets
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-3376
>                 URL: https://issues.apache.org/jira/browse/HDFS-3376
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client
>    Affects Versions: 2.0.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: hdfs-3376.txt
>
>
> After fixing the datanode side of keepalive to properly disconnect stale clients, (HDFS-3357),
the client side has the following issue: when it connects to a DN, it first tries to use cached
sockets, and will try a configurable number of sockets from the cache. If there are more cached
sockets than the configured number of retries, and all of them have been closed by the datanode
side, then the client will throw an exception and mark the replica node as dead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message