hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job
Date Wed, 18 Feb 2009 05:49:02 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674476#action_12674476

Hemanth Yamijala commented on HADOOP-4490:

Thanks for the review Arun.

I had a discussion with Sreekanth about the changes, and we are proposing the following:

- Introduce a {{TaskTracker.initializeSystemDirs}}. This will create $mapred.local.dir/taskTracker/jobCache/,
$mapred.local.dir/taskTracker/archives, and $hadoop.log.dir/userlogs on all relevant disks.
Currently, as per Arun's comments, we'll have this API in TaskTracker, which will be called
at Tracker initialization time. If it is felt that this should be per TaskController, then
we can easily move this to the TaskController API. I think this may need 777 on the $mapred.local.dir/taskTracker/jobCache/
directory currently because the files would be created both by the task and the tracker -
for e.g. the task could create the output directories on a new disk which has yet not been
touched by the tracker.

- Introduce a {{TaskController.initializeJob}}. This will be called from {{TaskTracker.localizeJob}},
with the jobid as parameter. This will set up the access for $mapred.local.dir/taskTracker/jobCache/jobid
directories on all disks which have been touched by localization.

- Modify {{TaskController.launchTaskJVM}} to set up permissions for the log dir and the pid
dir associated with that task. This will remove the call to {{initializeTask}} from the {{JvmManager.runChild}}

- Modify {{TaskController.initializeTask}} to set up permissions for the log dir, pid dir,
and task cache dir for the task. There is no need to set up things for the job, because it's
been done in {{initializeJob}} already. We will need to repeat the permission setting for
the log dir and pid dir.

- Modify {{DistributedCache.localizeCache}} to set up permissions for the localized files.
We propose to recursively set up 755 permissions (hardcoded) for all files under the $mapred.local.dir/taskTracker/archive/
directory for now. This might repeatedly set up permissions for files that are already correctly
setup. However, it will keep things simple. If there's a performance issue, it is easy to
address it, by setting it only for the files being localized, and by walking up its parent
paths. Please let me know if this seems a bad choice.

The above changes will mean we can remove:

- DistributedCacheFileAccessInfo
- FileUtil.setPermissionsForPathComponents
- TaskController.cleanup
- The runningJobs state maintained by LinuxTaskController.

Arun, does this tie in with your expectations ?

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch,
HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the
user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message