hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Isaacson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-6311) Add support for unix domain sockets to JNI libs
Date Thu, 11 Oct 2012 01:25:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473757#comment-13473757

Andy Isaacson commented on HADOOP-6311:

High level comments:
* We need a design doc explaining how all of these bits work together. This Jira has gone
on long enough that it does not serve as documentation.
* I didn't review the red-black and splay tree implementations at all. I'm not sure why we
expect this to be big/contended enough to deserve anything beyond a trivial hash table, which
takes about 20 lines of C.  (ah, I see the code comes from \*BSD, so that's good at least.
 We should document where and what version it came from for future maintainers' sanity.)

+++ hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/fd_server.h
+ * This file includes some common utilities 
+ * for all native code used in hadoop.
+ */
I don't think this comment is accurate.
+#include <stdint.h>
Please move {{#include}}s to the relevant {{.c}} unless they're needed in the {{.h}} directly.
Doesn't look like it's needed here.
+    memset(&addr, 0, sizeof(struct sockaddr_un));
I prefer to say {{memset(&x, 0, sizeof\(x\))}} so that the code is clearly using the correct
size. I don't feel too strongly about this though.
+    addr.sun_family = AF_UNIX;
+    if (bind(ud->sock, (struct sockaddr*)&addr, sizeof(sa_family_t)) < 0) {
This seems to be using the Linux-proprietary "abstract namespace".  If we do this it should
be a Linux-specific name, not "unixDomainSock" which implies that the code is portable to
other UNIXes such as Darwin/Mac OS or Solaris or FreeBSD.

The abstract socket API is documented at http://www.kernel.org/doc/man-pages/online/pages/man7/unix.7.html

(If I'm wrong and the abstract sockets are supported by other OSes then great! but I'm pretty
sure they're not.)

Talking to Colin offline we confirmed that abstract sockets are Linux-specific, but he pointed
out that {{unixDomainSockCreateAndBind}} handles both abstract sockets and named sockets (in
the {{if(jpath)}} branch).  So the name is OK but we need a comment calling out the abstract
socket use case.  The Linux-specific code will compile OK on other OSes, but it might be useful
if the exception message says "your OS requires an explicit path" on non-Linux (using an {{#ifndef
__linux__}} perhaps).

The control flow is a little confusing but not too bad, it could use a comment perhaps something
like {{/* Client requested abstract socket (see unix(7) for details) by setting path = null.
*/}} in the abstract path case.

+  if (!jpath) {
... 20 lines of code
+  } else {
... 10 lines of code
+  }

I'd reorder them to {code}if (jpath) { ... } else { /* !jpath */ ... } {code} as it's one
less bit-flip to think about.

Could you explain the benefits of using abstract sockets?  Why is it better than a named socket?
 Ideally in a code comment near the implementation, or in this Jira.

+      jthr = newNativeIOException(env, errno, "error autobinding "
+                                  "PF_UNIX socket: ");
I don't recognize the phrase "autobinding".  Is that something specific to abstract sockets?
if so it can be described in the documentation.
+      jthr = newNativeIOException(env, EIO, "getsockname():  "
+               "autobound abstract socket identifier started with / ");
Most of your newnativeIOException texts end with {{: "}} but this one ends with {{/ "}}. Best
to be consistent.
+jthrowable unixDomainSetupSockaddr(JNIEnv *env, const char *id,
I think this function can be static, right?
+#define RETRY_ON_EINTR(ret, expr) do { \
+  ret = expr; \
+} while ((ret == -1) && (errno == EINTR));
This probably wants a maximum retry count (hardcoding 100 or thereabouts should be fine).

+static ssize_t safe_write(int fd, const void *b, size_t c)
+  int res;
+  while (c > 0) {
+    res = write(fd, b, c);
+    if (res < 0) {
+      if (errno != EINTR)
+        return -errno;
+      continue;
+    }
+    c -= res;
+    b = (char *)b + res;
* I'd use a local {{char *p = b}} rather than having the cast in the loop.
* if write returns a too large value (which "cannot happen" but bugs happen) and c underflows,
since it's unsigned the loop will spin forever (2^63 is forever).  I'd explicitly check {{if
(res > c) return;}} before decrementing c.
* {{write(2)}} returns the number written.  Seems like we should do that here too.  If the
user calls safe_write(100), we write 50 and then get ENOSPC, we should return 50 I think.
 But I'm not sure, maybe that's not the right interface contract here.

+  if (!cmsg->cmsg_type == SCM_RIGHTS) {
Should be {{if (cmsg_type != SCM\_RIGHTS)}}.
+  if (setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, (char *)&timeout,
no need to cast the pointer, the fourth argument to {{setsockopt}} is {{void *}}.
+  /** The raw FileDescriptor */
+  int rawFd;
I think "raw" means "Posix filedescriptor" here?
+  pFd->rawFd = dup(rawFd);
Is there any chance we'll see races with fseek on these filedescriptors?  Unfortunately two
{{dup()}}ed fds share a single file offset pointer.  If this is something clients need to
worry about we'll have to document it very prominently.

+  // Unpublish any remaining file descriptors.
+	RB_FOREACH_SAFE(pFd, publishedFdsByCookie,
+                  &data->publishedByCookie, pFdTmp) {
+    RB_REMOVE(publishedFdsByCookie, &data->publishedByCookie, pFd);
+    publishedFdFree(env, pFd);
+	}
there's some indentation craziness going on here.

+static jlong generateRandomCookie(void)
+  uint64_t lo = ((uint64_t)mrand48()) & 0xffffffffULL;
+  uint64_t hi = ((uint64_t)mrand48()) << 32;
+  return (jlong)(lo | hi); 
Given that this is potentially a security-critical cookie, we should use /dev/urandom.
(not /dev/random due to the entropy pool blocking issues.)
+    // See man cmsg3) for more details about how 'ancillary data' works.
missing a ( I think.  I would write {{See cmsg(3) for}}.

Overall the code is pretty solid.  I wish we were not open-coding so much JNI argument string
stuff, it's just asking for trouble, but I don't have a better option.  I'm not entirely convinced
that this isn't overdesigned, but I'll wait for the design doc before coming to conclusions.
> Add support for unix domain sockets to JNI libs
> -----------------------------------------------
>                 Key: HADOOP-6311
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6311
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: native
>    Affects Versions: 0.20.0
>            Reporter: Todd Lipcon
>            Assignee: Colin Patrick McCabe
>         Attachments: 6311-trunk-inprogress.txt, HADOOP-6311.014.patch, HADOOP-6311.016.patch,
HADOOP-6311.018.patch, HADOOP-6311.020b.patch, HADOOP-6311.020.patch, HADOOP-6311.021.patch,
HADOOP-6311-0.patch, HADOOP-6311-1.patch, hadoop-6311.txt
> For HDFS-347 we need to use unix domain sockets. This JIRA is to include a library in
common which adds a o.a.h.net.unix package based on the code from Android (apache 2 license)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message