Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 98456 invoked from network); 8 Jun 2009 13:42:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Jun 2009 13:42:18 -0000 Received: (qmail 52107 invoked by uid 500); 8 Jun 2009 13:42:29 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 52053 invoked by uid 500); 8 Jun 2009 13:42:29 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 52043 invoked by uid 99); 8 Jun 2009 13:42:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Jun 2009 13:42:29 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Jun 2009 13:42:27 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 6DD03234C004 for ; Mon, 8 Jun 2009 06:42:07 -0700 (PDT) Message-ID: <562528964.1244468527441.JavaMail.jira@brutus> Date: Mon, 8 Jun 2009 06:42:07 -0700 (PDT) From: "Vinod K V (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4491) Per-job local data on the TaskTracker node should have right access-control In-Reply-To: <1677071153.1224707384211.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717253#action_12717253 ] Vinod K V commented on HADOOP-4491: ----------------------------------- Some (broad) proposals for solving this issue: *Localization* (A) Move the whole localization out of the taskTracker o be done as the user. - Adv: Because everything is done by the user, there is no hassle of changing permission now and then in TT. We just need to support reading of data back by the TT for serving. - Disadv: (As Devaraj pointed out in a quick chat) Synchronizing localization across the different process becomes quite complicated (B) Separate tt-only, child-only space from shared space. TT-only and child-only spaces are exclusively for the TT and the child respectively. TT does localization in tt-only area, task-controller binary then moves directory structure to the child only area. The shared space is for the stuff generated by the child for TT and has restricted access (511 on dirs and 444 on files) for TT and others. Even though other users can read this area, they won't be able to delete/write stuff. - Adv: Keeps things very simple - DisAdv: Sacrifices some of the stiff 700 acess restrictions in favour of a more manageable 511/444 permissions. (C) Instead of separating the directory structures completely, use the same for both TT and the user wherever necessary. - Adv : Avoids replication of the directory structure - DisAdv: Paths closer to the mapred-local-dir are owned by TT and further down the paths are owned by the child. Currently, task use same mapred.local.dir as task-tracker. When tasks need a path for writing their output, the LocalDirAllocator checks write permission on root directory owned by tt only and would fail We will have to handle this by modifying the mapred-local-dir of the child. *Intermediate output* - If we chose (A) or (C) for localization, we need to run the task-controller again to make the output accessible to the TT - If we chose (B) for localization, intermediate output is automatically available to the TT. *Task logs* - If we chose (A) or (C), whenever there is a request for the logs, we need to run the task-controller to run to stream the logs. Logs can be moved to tt-accessible area once task finishes. - If we chose (C), task-logs can be put in shared space readable by all users, and so are automatically available. Depending on these, I think that even though (B) sacrifices some of the strict 700 restrictions to a more free 511/444, it keeps things simple. But I am open to other proposals too. Thoughts? > Per-job local data on the TaskTracker node should have right access-control > --------------------------------------------------------------------------- > > Key: HADOOP-4491 > URL: https://issues.apache.org/jira/browse/HADOOP-4491 > Project: Hadoop Core > Issue Type: Sub-task > Components: mapred, security > Reporter: Arun C Murthy > Assignee: Vinod K V > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.