hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1186) While localizing a DistributedCache file, TT sets permissions recursively on the whole base-dir
Date Mon, 09 Nov 2009 13:28:33 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774963#action_12774963

Hemanth Yamijala commented on MAPREDUCE-1186:

Some history:

Before HADOOP-4490, files and archives localized as part of distributed cache used to be given
executable permissions. I suppose an assumption was that the directories and files created
during this localization process had read permissions automatically for the owner of the files.
And since the owner of the files, basically the user tasktracker is running as, was also the
owner of the task process, this was sufficient to access the cache files.

In HADOOP-4490, we had a situation where the tasktracker and the task could run as different
users. The tracker localizes the files and the task needs to access the files. So at a minimum,
read and execute permissions on directories and files to others needed to be granted. As mentioned
in the comment linked above, a choice was made to recursively set these permissions on all
files starting from the base directory - a performance problem as observed on clusters with
a very, very large number of localized cache files.

In MAPREDUCE-856, to solve the requirement of securing access to the distributed cache files,
the local directory structure was changed to be per user. Further, in the LinuxTaskController,
ownership and permissions were set for all files under a user's archive folder to the user
and providing access only to that user. For the DefaultTaskController, the same changes as
made in HADOOP-4490 were retained, though it was possibly unnecessary.

First, to revisit if we need any permission setting for distributed cache files:

I think this is still required. For the DefaultTaskController, executable permissions need
to be set on the localized files as in the pre-HADOOP-4490 days. For the LinuxTaskController,
we need to change ownership and set permissions in the task controller for that user.

However, in both cases, I suppose we only need to set permissions for files that are actually
copied from DFS to the local file system (including any directories created in this process).
This will address the issue raised in this JIRA.

> While localizing a DistributedCache file, TT sets permissions recursively on the whole
> -----------------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-1186
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1186
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Vinod K V
>             Fix For: 0.21.0
> This is a performance problem.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message