hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4491) Per-job local data on the TaskTracker node should have right access-control
Date Mon, 13 Jul 2009 05:18:14 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730220#action_12730220

Hemanth Yamijala commented on HADOOP-4491:

Some other comments on the code:

- The code models some of the commands passed to the taskcontroller as 'first task' for JVM
or 'not first task' for JVM. This complicates the contract between the TT and the task-controller.
I think at a minimum these decisions should be restricted only to the JVM manager, and the
contract between the TT and task-controller should be handled by modelling new commands like
- The patch introduces a log directory argument to the task-controller. I think this is not
required as there is only one value for the hadoop.log.dir for all tasks.
- Code related to creation of work dirs is removed from TaskRunner. Where is it created now
- Permissions for job directory is changed multiple times - once per each task.
- finalizeTaskDirs (or parts of it) needs to be synchronized.
- In the case of jvm reuse, we still need to make the output available. Because it seems the
task completion event is sent as soon as the task is done not after jvm exits. This is only
a theoretical case possibly, but it still will be good to keep the code paths identical.
- Path permissions should be taken care of ? so, if mapred.local.dir doesn't have execute
permissions for others, we need to set them.
- In setup, we seem to be setting permissions even if directory creation fails. also shouldn't
we set permissions even if it exists.. so that it is right as per our requirement.
- Didn't understand the purpose of initStatus. Since it starts out being true, wouldn't it
always remain true ?
- Cache directory probably doesn't need 777 because it is not written to by the tasks. We
can probably retain this if HADOOP-4490 set like permissions, since this will be addressed
in HADOOP-4493.
- getBaseIntermediateOutputDir seems an overkill if it is just returning a constant.
- Changes in SpillRecord seem to be unnecessary.

Some nits:
- There are some else blocks without code, but with a code comment explaining why there's
no else block. While useful, it is somewhat unconventional. I would recommend the reason be
moved to a comment starting the if block itself.
- TODOs in the patch must be discussed and resolved.
- TaskController.FILE_PERMISSIONS doesn't seem to be used anywhere.
- Rename finalizeTaskDirs as finalizeTask - it matches with the naming convention for the
other apis in taskcontroller.
- Add a comment about why task-work is 755.
- mapred.child.local.dir has an extra comma at the end.

> Per-job local data on the TaskTracker node should have right access-control
> ---------------------------------------------------------------------------
>                 Key: HADOOP-4491
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4491
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Arun C Murthy
>            Assignee: Vinod K V
>         Attachments: HADOOP-4491-20090623-common.1.txt, HADOOP-4491-20090623-mapred.1.txt,
HADOOP-4491-20090703-common.1.txt, HADOOP-4491-20090703-common.txt, HADOOP-4491-20090703.1.txt,
HADOOP-4491-20090703.txt, HADOOP-4491-20090707-common.txt, HADOOP-4491-20090707.txt

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message