hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Burlison (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-12488) DomainSocket: Solaris does not support timeouts on AF_UNIX sockets
Date Fri, 16 Oct 2015 17:17:05 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-12488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alan Burlison updated HADOOP-12488:
-----------------------------------
    Description: 
>From the hadoop-common-dev mailing list:

http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201509.mbox/%3C560B99F6.6010408@oracle.com%3E
http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201510.mbox/%3C560EA6BF.2070001@oracle.com%3E

{quote}
Now that the Hadoop native code builds on Solaris I've been chipping 
away at all the test failures. About 50% of the failures involve 
DomainSocket, either directly or indirectly. That seems to be mainly 
because the tests use DomainSocket to do single-node testing, whereas in 
production it seems that DomainSocket is less commonly used 
(https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html).

The particular problem on Solaris is that socket read/write timeouts 
(the SO_SNDTIMEO and SO_RCVTIMEO socket options) are not supported for 
UNIX domain (PF_UNIX) sockets. Those options are however supported for 
PF_INET sockets. That's because the socket implementation on Solaris is 
split roughly into two parts, for inet sockets and for STREAMS sockets, 
and the STREAMS implementation lacks support for SO_SNDTIMEO and 
SO_RCVTIMEO. As an aside, performance of sockets that use loopback or 
the host's own IP is slightly better than that of UNIX domain sockets on 
Solaris.

I'm investigating getting timeouts supported for PF_UNIX sockets added 
to Solaris, but in the meantime I'm also looking how this might be 
worked around in Hadoop. One way would be to implement timeouts by 
wrapping all the read/write/send/recv etc calls in DomainSocket.c with 
either poll() or select().

The basic idea is to add two new fields to DomainSocket.c to hold the 
read/write timeouts. On platforms that support SO_SNDTIMEO and 
SO_RCVTIMEO these would be unused as setsockopt() would be used to set 
the socket timeouts. On platforms such as Solaris the JNI code would use 
the values to implement the timeouts appropriately.

To prevent the code in DomainSocket.c becoming a #ifdef hairball, the 
current socket IO function calls such as accept(), send(), read() etc 
would be replaced with a macros such as HD_ACCEPT. On platforms that 
provide timeouts these would just expand to the normal socket functions, 
on platforms that don't support timeouts it would expand to wrappers 
that implements timeouts for them.

The only caveats are that all code that does anything to a PF_UNIX 
socket would *always* have to do so via DomainSocket. As far as I can 
tell that's not an issue, but it would have to be borne in mind if any 
changes were made in this area.

Before I set about doing this, does the approach seem reasonable?
{quote}

{quote}
Unfortunately it's not a simple as I'd hoped. For some reason I don't 
really understand, nearly all the JNI methods are declared as static and 
therefore don't get a "this" pointer and as a consequence all the class 
data members that are needed by the JNI code have to be passed in as 
parameters. That also means it's not possible to store the timeouts in 
the DomainSocket fields from within the JNI code. Most of the JNI 
methods should be instance methods rather than static ones, but making 
that change would require some significant surgery to DomainSocket.
{quote}


  was:
>From the hadoop-common-dev mailing list:

http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201509.mbox/%3C560B99F6.6010408@oracle.com%3E
http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201510.mbox/%3C560EA6BF.2070001@oracle.com%3E

{noformat}
Now that the Hadoop native code builds on Solaris I've been chipping 
away at all the test failures. About 50% of the failures involve 
DomainSocket, either directly or indirectly. That seems to be mainly 
because the tests use DomainSocket to do single-node testing, whereas in 
production it seems that DomainSocket is less commonly used 
(https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html).

The particular problem on Solaris is that socket read/write timeouts 
(the SO_SNDTIMEO and SO_RCVTIMEO socket options) are not supported for 
UNIX domain (PF_UNIX) sockets. Those options are however supported for 
PF_INET sockets. That's because the socket implementation on Solaris is 
split roughly into two parts, for inet sockets and for STREAMS sockets, 
and the STREAMS implementation lacks support for SO_SNDTIMEO and 
SO_RCVTIMEO. As an aside, performance of sockets that use loopback or 
the host's own IP is slightly better than that of UNIX domain sockets on 
Solaris.

I'm investigating getting timeouts supported for PF_UNIX sockets added 
to Solaris, but in the meantime I'm also looking how this might be 
worked around in Hadoop. One way would be to implement timeouts by 
wrapping all the read/write/send/recv etc calls in DomainSocket.c with 
either poll() or select().

The basic idea is to add two new fields to DomainSocket.c to hold the 
read/write timeouts. On platforms that support SO_SNDTIMEO and 
SO_RCVTIMEO these would be unused as setsockopt() would be used to set 
the socket timeouts. On platforms such as Solaris the JNI code would use 
the values to implement the timeouts appropriately.

To prevent the code in DomainSocket.c becoming a #ifdef hairball, the 
current socket IO function calls such as accept(), send(), read() etc 
would be replaced with a macros such as HD_ACCEPT. On platforms that 
provide timeouts these would just expand to the normal socket functions, 
on platforms that don't support timeouts it would expand to wrappers 
that implements timeouts for them.

The only caveats are that all code that does anything to a PF_UNIX 
socket would *always* have to do so via DomainSocket. As far as I can 
tell that's not an issue, but it would have to be borne in mind if any 
changes were made in this area.

Before I set about doing this, does the approach seem reasonable?
{noformat}

{noformat}
Unfortunately it's not a simple as I'd hoped. For some reason I don't 
really understand, nearly all the JNI methods are declared as static and 
therefore don't get a "this" pointer and as a consequence all the class 
data members that are needed by the JNI code have to be passed in as 
parameters. That also means it's not possible to store the timeouts in 
the DomainSocket fields from within the JNI code. Most of the JNI 
methods should be instance methods rather than static ones, but making 
that change would require some significant surgery to DomainSocket.
{noformat}


> DomainSocket: Solaris does not support timeouts on AF_UNIX sockets
> ------------------------------------------------------------------
>
>                 Key: HADOOP-12488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12488
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: net
>    Affects Versions: 2.7.1
>         Environment: Solaris
>            Reporter: Alan Burlison
>
> From the hadoop-common-dev mailing list:
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201509.mbox/%3C560B99F6.6010408@oracle.com%3E
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201510.mbox/%3C560EA6BF.2070001@oracle.com%3E
> {quote}
> Now that the Hadoop native code builds on Solaris I've been chipping 
> away at all the test failures. About 50% of the failures involve 
> DomainSocket, either directly or indirectly. That seems to be mainly 
> because the tests use DomainSocket to do single-node testing, whereas in 
> production it seems that DomainSocket is less commonly used 
> (https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html).
> The particular problem on Solaris is that socket read/write timeouts 
> (the SO_SNDTIMEO and SO_RCVTIMEO socket options) are not supported for 
> UNIX domain (PF_UNIX) sockets. Those options are however supported for 
> PF_INET sockets. That's because the socket implementation on Solaris is 
> split roughly into two parts, for inet sockets and for STREAMS sockets, 
> and the STREAMS implementation lacks support for SO_SNDTIMEO and 
> SO_RCVTIMEO. As an aside, performance of sockets that use loopback or 
> the host's own IP is slightly better than that of UNIX domain sockets on 
> Solaris.
> I'm investigating getting timeouts supported for PF_UNIX sockets added 
> to Solaris, but in the meantime I'm also looking how this might be 
> worked around in Hadoop. One way would be to implement timeouts by 
> wrapping all the read/write/send/recv etc calls in DomainSocket.c with 
> either poll() or select().
> The basic idea is to add two new fields to DomainSocket.c to hold the 
> read/write timeouts. On platforms that support SO_SNDTIMEO and 
> SO_RCVTIMEO these would be unused as setsockopt() would be used to set 
> the socket timeouts. On platforms such as Solaris the JNI code would use 
> the values to implement the timeouts appropriately.
> To prevent the code in DomainSocket.c becoming a #ifdef hairball, the 
> current socket IO function calls such as accept(), send(), read() etc 
> would be replaced with a macros such as HD_ACCEPT. On platforms that 
> provide timeouts these would just expand to the normal socket functions, 
> on platforms that don't support timeouts it would expand to wrappers 
> that implements timeouts for them.
> The only caveats are that all code that does anything to a PF_UNIX 
> socket would *always* have to do so via DomainSocket. As far as I can 
> tell that's not an issue, but it would have to be borne in mind if any 
> changes were made in this area.
> Before I set about doing this, does the approach seem reasonable?
> {quote}
> {quote}
> Unfortunately it's not a simple as I'd hoped. For some reason I don't 
> really understand, nearly all the JNI methods are declared as static and 
> therefore don't get a "this" pointer and as a consequence all the class 
> data members that are needed by the JNI code have to be passed in as 
> parameters. That also means it's not possible to store the timeouts in 
> the DomainSocket fields from within the JNI code. Most of the JNI 
> methods should be instance methods rather than static ones, but making 
> that change would require some significant surgery to DomainSocket.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message