hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4594) container-executor fails to remove directory tree when chmod required
Date Mon, 01 Feb 2016 16:56:40 GMT

    [ https://issues.apache.org/jira/browse/YARN-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126562#comment-15126562
] 

Jason Lowe commented on YARN-4594:
----------------------------------

Thanks for updating the patch!

bq. I can see places in the code that rely on cgroups, which is a kernel feature that MacOS
just doesn't have (and may never have).

Yeah, it cannot work in practice on MacOS from this and other stuff in there.  Sorry I should
have noticed that.  I suppose there's a chance it happens to compile in case someone is doing
a native build on MacOS X, but arguably we should just have those builds skip container-executor
if they already aren't doing so.

Comments on the latest patch:

The unit test fails, probably because we now deference NULL instead of terminating the loop
in this code.  When readdir returns NULL and there's no error, the code will try to dereference
a null de.
{code}
    while (1) {
      struct dirent *de;
      char *new_fullpath = NULL;
      
      errno = 0;
      de = readdir(dfd);
      if (!de) {
        ret = errno;
        if (ret) {
          fprintf(LOGFILE, "readdir(%s) failed: %s\n", name, strerror(ret));
          goto done;
        }
      }
      if (!strcmp(de->d_name, ".")) {
{code}

The code can end up returning success despite an allocation failure because it doesn't initialize
{{ret}} in this error case:
{code}
      if (asprintf(&new_fullpath, "%s/%s", fullpath, de->d_name) < 0) {
        fprintf(LOGFILE, "Failed to allocate string for %s/%s.\n",
                fullpath, de->d_name); 
        goto done;
      }
{code}

recursive_unlink_helper and recursive_unlink_children now returns normal error codes, but
this is still expecting negative codes:
{code}
  ret = recursive_unlink_children(full_path);
  if (ret == -ENOENT) {
    return 0;
  }
{code}

This code would be simpler to read if it didn't negate the error when trying to deal with
it:
{code}
  if (rmdir(full_path) != 0) {
    ret = -errno;
    if (ret != -ENOENT) {
      fprintf(LOGFILE, "Couldn't delete directory %s - %s\n",
              full_path, strerror(-ret));
      return -1;
    }
  }
{code}


> container-executor fails to remove directory tree when chmod required
> ---------------------------------------------------------------------
>
>                 Key: YARN-4594
>                 URL: https://issues.apache.org/jira/browse/YARN-4594
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: YARN-4594.001.patch, YARN-4594.002.patch
>
>
> test-container-executor.c doesn't work:
> * It assumes that realpath(/bin/ls) will be /bin/ls, whereas it is actually /usr/bin/ls
on many systems.
> * The recursive delete logic in container-executor.c fails -- nftw does the wrong thing
when confronted with directories with the wrong mode (permission bits), leading to an attempt
to run rmdir on a non-empty directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message