hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Burlison (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12487) DomainSocket.close() assumes incorrect Linux behaviour
Date Tue, 27 Oct 2015 16:15:27 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14976634#comment-14976634
] 

Alan Burlison commented on HADOOP-12487:
----------------------------------------

If Al Viro thought the suggestions were without merit then I'd have expected him to have simply
terminated the discussion, which he hasn't. The current Linux shutdown() and close() behaviour
are not POSIX compliant. Whether or not the Linux developers think it's worth complying with
POSIX in this area is an interesting discussion, but not really pertinent here as even if
it does change, Hadoop has to deal with the current Linux behaviour.

I understand your *theoretical* race scenario however after looking at the Hadoop code carefully
I can't see how it will ever occur *in practice*. DomainSocket uses a CloseableReferenceCount
to make sure that once the FD encapsulated by the DomainSocket is closed then it isn't used
any more. It therefore doesn't matter if the FD is recycled elsewhere, because the copy of
it inside the DomainSocket is 'dead' and therefore irrelevant.

All the read/write uses of the FD are made from inside DomainSocket as far as I can tell,
and are therefore protected by the CloseableReferenceCount. However the fd field itself is
not private. As far as I can tell the only place the FD is used externally to DomainSocket
is from within DomainSocketWatcher and again, as far as I can tell it's only used for poll(),
DomainSocketWatcher doesn't read or write to it, doesn't open any FDs itself and re-validates
the DomainSocket FDs it uses by calling sock.refCount.unreferenceCheckClosed() etc, so I think
that is safe as well.

Unless I've missed something, I believe the close routine in DomainSocket() could be changed
to call shutdown() immediately followed by close(), then wait for the CloseableReferenceCount
to reach zero and return. If we ignore any failures from shutdown() I think that would then
work on both Linux and Solaris.

If I'm missing a pitfall here somewhere here I'd be grateful if you could point it out, thanks.

> DomainSocket.close() assumes incorrect Linux behaviour
> ------------------------------------------------------
>
>                 Key: HADOOP-12487
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12487
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: net
>    Affects Versions: 2.7.1
>         Environment: Linux Solaris
>            Reporter: Alan Burlison
>            Assignee: Alan Burlison
>         Attachments: shutdown.c
>
>
> I'm getting a test failure in TestDomainSocket.java, in the testSocketAcceptAndClose
test. That test creates a socket which one thread waits on in DomainSocket.accept() whilst
a second thread sleeps for a short time before closing the same socket with DomainSocket.close().
> DomainSocket.close() first calls shutdown0() on the socket before closing close0() -
both those are thin wrappers around the corresponding libc socket calls. DomainSocket.close()
contains the following comment, explaining the logic involved:
> {code}
>           // Calling shutdown on the socket will interrupt blocking system
>           // calls like accept, write, and read that are going on in a
>           // different thread.
> {code}
> Unfortunately that relies on non-standards-compliant Linux behaviour. I've written a
simple C test case that replicates the scenario above:
> # ThreadA opens, binds, listens and accepts on a socket, waiting for connections.
> # Some time later ThreadB calls shutdown on the socket ThreadA is waiting in accept on.
> Here is what happens:
> On Linux, the shutdown call in ThreadB succeeds and the accept call in ThreadA returns
with EINVAL.
> On Solaris, the shutdown call in ThreadB fails and returns ENOTCONN. ThreadA continues
to wait in accept.
> Relevant POSIX manpages:
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/accept.html
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/shutdown.html
> The POSIX shutdown manpage says:
> "The shutdown() function shall cause all or part of a full-duplex connection on the socket
associated with the file descriptor socket to be shut down."
> ...
> "\[ENOTCONN] The socket is not connected."
> Page 229 & 303 of "UNIX System V Network Programming" say:
> "shutdown can only be called on sockets that have been previously connected"
> "The socket \[passed to accept that] fd refers to does not participate in the connection.
It remains available to receive further connect indications"
> That is pretty clear, sockets being waited on with accept are not connected by definition.
Nor is it the accept socket connected when a client connects to it, it is the socket returned
by accept that is connected to the client. Therefore the Solaris behaviour of failing the
shutdown call is correct.
> In order to get the required behaviour of ThreadB causing ThreadA to exit the accept
call with an error, the correct way is for ThreadB to call close on the socket that ThreadA
is waiting on in accept.
> On Solaris, calling close in ThreadB succeeds, and the accept call in ThreadA fails and
returns EBADF.
> On Linux, calling close in ThreadB succeeds but ThreadA continues to wait in accept until
there is an incoming connection. That accept returns successfully. However subsequent accept
calls on the same socket return EBADF.
> The Linux behaviour is fundamentally broken in three places:
> # Allowing shutdown to succeed on an unconnected socket is incorrect.  
> # Returning a successful accept on a closed file descriptor is incorrect, especially
as future accept calls on the same socket fail.
> # Once shutdown has been called on the socket, calling close on the socket fails with
EBADF. That is incorrect, shutdown should just prevent further IO on the socket, it should
not close it.
> The real issue though is that there's no single way of doing this that works on both
Solaris and Linux, there will need to be platform-specific code in Hadoop to cater for the
Linux brokenness. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message