hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod K V (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4491) Per-job local data on the TaskTracker node should have right access-control
Date Mon, 08 Jun 2009 13:42:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717253#action_12717253
] 

Vinod K V commented on HADOOP-4491:
-----------------------------------

Some (broad) proposals for solving this issue:

*Localization*

 (A) Move the whole localization out of the taskTracker o be done as the user.
    - Adv: Because everything is done by the user, there is no hassle of changing permission
now and then in TT. We just need to support reading of data back by the TT for serving.
    - Disadv: (As Devaraj pointed out in a quick chat) Synchronizing localization across the
different process becomes quite complicated

 (B) Separate tt-only, child-only space from shared space. TT-only and child-only spaces are
exclusively for the TT and the child respectively. TT does localization in tt-only area, task-controller
binary then moves directory structure to the child only area. The shared space is for the
stuff generated by the child for TT and has restricted access (511 on dirs and 444 on files)
for TT and others. Even though other users can read this area, they won't be able to delete/write
stuff.
    - Adv: Keeps things very simple
    - DisAdv: Sacrifices some of the stiff 700 acess restrictions in favour of a more manageable
511/444 permissions.

 (C) Instead of separating the directory structures completely, use the same for both TT and
the user wherever necessary.
    - Adv : Avoids replication of the directory structure
    - DisAdv: Paths closer to the mapred-local-dir are owned by TT and further down the paths
are owned by the child. Currently, task use same mapred.local.dir as task-tracker. When tasks
need a path for writing their output, the LocalDirAllocator checks write permission on root
directory owned by tt only and would fail We will have to handle this by modifying the mapred-local-dir
of the child.

*Intermediate output*
 - If we chose (A) or (C) for localization, we need to run the task-controller again to make
the output accessible to the TT
 - If we chose (B) for localization, intermediate output is automatically available to the
TT.

*Task logs*
 - If we chose (A) or (C), whenever there is a request for the logs, we need to run the task-controller
to run to stream the logs. Logs can be moved to tt-accessible area once task finishes.
 - If we chose (C), task-logs can be put in shared space readable by all users, and so are
automatically available.

Depending on these, I think that even though (B) sacrifices some of the strict 700 restrictions
to a more free 511/444, it keeps things simple. But I am open to other proposals too. Thoughts?

> Per-job local data on the TaskTracker node should have right access-control
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-4491
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4491
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Vinod K V
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message