hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor
Date Mon, 20 Oct 2014 21:12:34 GMT

    [ https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177477#comment-14177477
] 

zhihai xu commented on YARN-2701:
---------------------------------

Hi [~xgong], thanks for the details explanation. The explanation sounds reasonable to me.


Some nits:
1. since we only check permission for the final directory component,
I think we also need check finalComponent in the first call check_permission.
change
{code}
        } else if (check_permission(sb.st_mode, perm) == -1) {
{code}
to
{code}
        } else if (finalComponent == 1 && check_permission(sb.st_mode, perm) == -1)
{
{code}

2. Can we create a new function check_dir to remove the duplicate code which verify the existing
directory at two places?
We can also remove function check_permission by moving check_permission code into check_dir.
This is check_dir function:
{code}
int check_dir(char* npath, mode_t st_mode, mode_t desired, int finalComponent) {
    // Check whether it is a directory
    if (!S_ISDIR (st_mode)) {
      fprintf(LOGFILE, "Path %s is file not dir\n", npath);
      return -1;
   } else if (finalComponent == 1) {
  int filePermInt = st_mode & (S_IRWXU | S_IRWXG | S_IRWXO);
  int desiredInt = desired & (S_IRWXU | S_IRWXG | S_IRWXO);
  if (filePermInt != desiredInt) {
      fprintf(LOGFILE, "Path %s does not have desired permission.\n", npath);
      return -1;
    }
   }
   return 0;
}
{code}

3. Can we move free(npath); from create_validate_dirs to mkdirs?
It will be better to free the memory at the same function(mkdirs) which allocated the memory.
in mkdirs 
{code}
if (create_validate_dirs(npath, perm, path, 0) == -1) {
      free(npath);
       return -1;
     }
{code}

4. a little more optimization to remove redundant code:
we can merge these two piece of code:
        fprintf(LOGFILE, "Can't create directory %s in %s - %s\n", npath,
                path, strerror(errno));
by 
if (errno != EEXIST || stat(npath, &sb) != 0) {

The code after change will be like the following: 
{code}
int create_validate_dir(char* npath, mode_t perm, char* path, int finalComponent) {
  struct stat sb;
  if (stat(npath, &sb) != 0) {
    if (mkdir(npath, perm) != 0) {
      if (errno != EEXIST  || stat(npath, &sb) != 0) {
        fprintf(LOGFILE, "Can't create directory %s in %s - %s\n", npath,
                path, strerror(errno));
        return -1;
      }
      // The directory npath should exist.
      if (check_dir(npath, sb.st_mode, perm, finalComponent) == -1) {
          return -1;
       }
    }
  } else if(check_dir(npath, sb.st_mode, perm, finalComponent) == -1){
      return -1;
  }
  return 0;
}
{code}

5. Can we change the name create_validate_dirs to create_validate_dir? since we only create
one directory in create_validate_dirs.

> Potential race condition in startLocalizer when using LinuxContainerExecutor  
> ------------------------------------------------------------------------------
>
>                 Key: YARN-2701
>                 URL: https://issues.apache.org/jira/browse/YARN-2701
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>            Priority: Blocker
>         Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch
>
>
> When using LinuxContainerExecutor do startLocalizer, we are using native code container-executor.c.

> {code}
>      if (stat(npath, &sb) != 0) {
>        if (mkdir(npath, perm) != 0) {
> {code}
> We are using check and create method to create the appDir under /usercache. But if there
are two containers trying to do this at the same time, race condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message