hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10522) JniBasedUnixGroupMapping mishandles errors
Date Mon, 21 Apr 2014 18:26:17 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975831#comment-13975831
] 

Colin Patrick McCabe commented on HADOOP-10522:
-----------------------------------------------

{code}
+    // The following call returns errno. Reading the global errno wihtout
+    // locking is not thread-safe.
{code}

As [~cnauroth] mentioned, this is wrong.  Please remove

{code}
+    pwd = NULL;
+    ret = getpwnam_r(username, &uinfo->pwd, uinfo->buf,
                          uinfo->buf_sz, &pwd);
-    } while ((!pwd) && (errno == EINTR));
{code}

Unfortunately, this is wrong too. :(

{{getgrgrid_r}} does not set {{errno}} (or at least is not documented to do so by POSIX).
 Instead, it returns the error number directly.  Here's the man page on my system:

{code}
       On  success,  getgrnam_r() and getgrgid_r() return zero, and set *result to grp.  If
no matching group record was found, these functions return 0 and store NULL in *result.  In
case of error, an error number is returned, and NULL is
       stored in *result.
{code}

Notice that {{errno}} is not mentioned.

{code}
   ret = hadoop_user_info_fetch(uinfo, username);
-  if (ret == ENOENT) {
-    jgroups = (*env)->NewObjectArray(env, 0, g_string_clazz, NULL);
+  if (ret) {
+    if (ret == ENOENT) {
+      jgroups = (*env)->NewObjectArray(env, 0, g_string_clazz, NULL);
+    } else { // handle other errors
+      char buf[128];
+      snprintf(buf, sizeof(buf), "getgrouplist: error looking up user. %d (%s)",
+               ret, terror(ret));
+      THROW(env, "java/lang/RuntimeException", buf);
+    }
{code}

Try {{(*env)->Throw(env, newRuntimeException("getgrouplist: error looking up user. %d (%s)",
ret, terror(ret)))}} here instead.

{code}
+  for (i = 0, ret = 0; i < MAX_USER_LOOKUP_TRIES; i++) {
{code}

This is wrong.  Just because we get {{EINTR}} 5 times doesn't mean we should fail.  Maybe
we're just handling a lot of signals (the JVM sends signals internally).  Also, why are we
increasing the buffer size when we get {{EINTR}}?  We should only increase the buffer size
when we get {{ERANGE}}.  I think the better way to do this would be to keep the old loop,
but break out if we got an {{ERANGE}} and the buffer size was above some fixed amount.  However,
this still seems strange to me.  Clearly the underlying library is buggy, if it keeps telling
us {{ERANGE}} forever.  Is there a particular bug we are trying to work around in this patch?

> JniBasedUnixGroupMapping mishandles errors
> ------------------------------------------
>
>                 Key: HADOOP-10522
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10522
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: hadoop-10522.patch
>
>
> The mishandling of errors in the jni user-to-groups mapping modules can cause segmentation
faults in subsequent calls.  Here are the bugs:
> 1) If {{hadoop_user_info_fetch()}} returns an error code that is not ENOENT, the error
may not be handled at all.  This bug was found by [~cnauroth].
> 2)  In {{hadoop_user_info_fetch()}} and {{hadoop_group_info_fetch()}}, the global {{errno}}
is directly used. This is not thread-safe and could be the cause of some failures that disappeared
after enabling the big lookup lock.
> 3) In the above methods, there is no limit on retries.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message