hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6708) Nodemanager container crash after ext3 folder limit
Date Tue, 27 Jun 2017 21:50:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065535#comment-16065535
] 

Jason Lowe commented on YARN-6708:
----------------------------------

Thanks for updating the patch!  This looks a lot cleaner.

Is there a reason to use 755 permissions on the intermediate directories in the user cache?
 Note that we only allow 710 permissions on the final directly, and it seems intermediate
directories should only require that as well, or 750 at the most.  I don't see any reason
to allow any "other" permissions on user-specific directories in the local cache.

If the parent of destDirPath is the cache root then we won't set the permissions of destDirPath
but otherwise we will?  Seems like we could end up with inconsistent permissions on dest directories
since sometimes the disk validator will end up creating it.  I think we should just always
do the while loop rather than special-case the no-intermediate-directories case -- or am I
missing something?

Nit: It would improve readability if we moved the directory handling stuff to a utility method
like createDirAndParents or something similar and pass the desired permissions for the dirs
as an argument.

There be an After method that deletes {{basedir}} so we don't leave cruft on the filesystem
if a unit test fails.

The AtomicLong use is overkill in the test since there's no thread contention on that object.
 Actually the test could be a _lot_ simpler if we just told the container localizer to download
a singl file to basedir/0/0/85/ and verify the permissions of the directories are correct.
 No need to download a bunch of times -- we're not testing the LocalCacheDirectoryManager
here, and if the LocalCacheDirectoryManager internals change then this test is going to fail
since it knows a bit too much about it.

Similarly I do not see the need to actually create a jar and copy it.  We can mock out the
downloading process (see the {{mockOutDownloads}} method) so we don't have to have a real
file on disk to download.  The localizer will still create the directories but the executor
service won't actually call FSDownload, and that's fine for what we're trying to test.

The test should not be using reflection to modify access to objects.  The user cache permissions
can be package private, and we can just replicate the FSDownload permissions for this test
if really necessary.


> Nodemanager container crash after ext3 folder limit
> ---------------------------------------------------
>
>                 Key: YARN-6708
>                 URL: https://issues.apache.org/jira/browse/YARN-6708
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>         Attachments: YARN-6708.001.patch, YARN-6708.002.patch, YARN-6708.003.patch, YARN-6708.004.patch
>
>
> Configure umask as *027* for nodemanager service user
> and {{yarn.nodemanager.local-cache.max-files-per-directory}} as {{40}}. After 4  *private*
dir localization next directory will be *0/14*
> Local Directory cache manager 
> {code}
> vm2:/opt/hadoop/release/data/nmlocal/usercache/mapred/filecache # l
> total 28
> drwx--x--- 7 mapred hadoop 4096 Jun 10 14:35 ./
> drwxr-s--- 4 mapred hadoop 4096 Jun 10 12:07 ../
> drwxr-x--- 3 mapred users  4096 Jun 10 14:36 0/
> drwxr-xr-x 3 mapred users  4096 Jun 10 12:15 10/
> drwxr-xr-x 3 mapred users  4096 Jun 10 12:22 11/
> drwxr-xr-x 3 mapred users  4096 Jun 10 12:27 12/
> drwxr-xr-x 3 mapred users  4096 Jun 10 12:31 13/
> {code}
> *drwxr-x---* 3 mapred users  4096 Jun 10 14:36 0/ is only *750*
> Nodemanager user will not be able check for localization path exists or not.
> {{LocalResourcesTrackerImpl}}
> {code}
>     case REQUEST:
>       if (rsrc != null && (!isResourcePresent(rsrc))) {
>         LOG.info("Resource " + rsrc.getLocalPath()
>             + " is missing, localizing it again");
>         removeResource(req);
>         rsrc = null;
>       }
>       if (null == rsrc) {
>         rsrc = new LocalizedResource(req, dispatcher);
>         localrsrc.put(req, rsrc);
>       }
>       break;
> {code}
> *isResourcePresent* will always return false and same resource will be localized to {{0}}
to next unique number



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message