hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1186) While localizing a DistributedCache file, TT sets permissions recursively on the whole base-dir
Date Wed, 23 Dec 2009 04:45:30 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793922#action_12793922

Hemanth Yamijala commented on MAPREDUCE-1186:

Some feedback:

- I would like to keep the call for cleanupStorage in TaskTracker.initialize(). It helps in
cases of abrupt TT shutdown. Admittedly, it may not be fully effective - because we would
not be able to cleanup files like we do in MAPREDUCE-896 (i.e. using the task-controller's
help). Still, it would cleanup whatever it can and that would be a considerable amount of
space under normal conditions, I think.
- In TrackerDistributedCacheManager.localizeCache where we construct DistributedCacheContext,
I think it makes sense to just call conf.get(JobContext.USER_NAME) rather than building a
new JobConf object for this one value.
- Should we rename DistributedCacheContext to DistributedCacheFileContext, as it is operating
on only one file ? Same for TaskController.initializeDistributedCache - change to initializeDistributedCacheFile.
Likewise in C code as well.
- Document DistributedCacheContext mentioning what it is useful for.
- Store the uniqueDir as a Path object instead of a string. This is consistent with our using
higher level objects like File in the Contexts.
- Remove LOG.info print in DefaultTaskController.initializeDistributedCache. Instead have
a LOG.warn in the exception handler.
- Guard the Debug logs using LOG.isDebugEnabled.
- Do not send full paths (as given in localizedUniqueDir) to task-controller. This is for
security purposes to prevent people from sending random paths and getting permissions changed
as a hack. Instead, we *hardcode* the path construction as much as possible inside the task-controller.
For e.g. in this case, if we send the selected base directory (mapred-local-dir), user and
random-id, we can easily construct the full path from the pattern such as $mapred-local-dir/taskTracker/$user/distcache/$random-id.
This is just another safeguard in case the task-controller is compromised somehow. This naturally
means the C code changes a bit as well, so the code will need to construct the path in initialize_distributed_cache
and then call secure_path. Please follow the pattern we use in other methods of task-controller
while doing this. Specifically:
-- We need to validate that the passed in mapred-local-dir is valid. Please take a look at
-- We also need to construct the path using methods like concatenate. Please take a look at
get_attempt_directory for an example.
- It would be very helpful to add a comment on why testDeleteCache needs to override configuration
to set a single mapred local dir.
- I would like to see a regression test for this, that ensures that file permissions are being
set only for the file being localized. So can we do something like this ? Localize a file.
After localization is complete, create a file under the directory where the file is localized
and ensure that it has permissions different from what is set by default. Then, localize another
file. After checking that the second file has the right permissions, verify that the file
created after the first localization has the permissions you set and not the default permissions.
This should work for both the LinuxTaskController case and the default case. 

Also, I would just like you to watch for changes in MAPREDUCE-896, which I suspect will most
likely clash with this patch. So, some merge may be required.

> While localizing a DistributedCache file, TT sets permissions recursively on the whole
> -----------------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-1186
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1186
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Vinod K V
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.22.0
>         Attachments: patch-1186-1.txt, patch-1186-2.txt, patch-1186-3-ydist.txt, patch-1186-3-ydist.txt,
patch-1186-ydist.txt, patch-1186-ydist.txt, patch-1186.txt
> This is a performance problem.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message