hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod K V (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HADOOP-4491) Per-job local data on the TaskTracker node should have right access-control
Date Mon, 08 Jun 2009 12:58:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716620#action_12716620
] 

Vinod K V edited comment on HADOOP-4491 at 6/8/09 5:56 AM:
-----------------------------------------------------------

I am summarizing the state-of-the-art local data management on TT

*Localization of task on the TT:*

 - Job localization
   -- happens once per job
   -- creates taskTracker/jobcache/jobid, tasktracker/jobcache/jobid/work directories recursively
   -- downloads job.xml to taskTracker/jobcache/jobid/job.xml and job jar to tasktracer/jobcache/jobid/jars/job.jar

 - Task localization
   -- happens once per task
   -- creates task's work directory recursively: taskTracker/jobcache/jobid/taskid[.cleanup]/work
   -- if needed, localizes archives and files for distributed cache to tasktracker/archive
and/or creates symlinks in the task's work directory, and rewrites job.xml
   -- creates mapred.child.tmp directory
   -- creates hadoop.log.dir|userlogs/taskid/ recursively and marks child to redirect its
stdout and stderr to this directory
   -- creates taskjvm.sh in case of linux task controller

*Intermediate output files*
 - All the intermediate files created by tasks run as user in taskTracker/jobcache/jobid/taskid[.cleanup]/output/
which is recursively created on demand by the child jvm.

*Intermediate output serving*
 - TaskTracker directly reads the map-intermediate output files from taskTracker/jobcache/jobid/taskid[.cleanup]/output/
and serves it to reduces via MapOutputServlet

*Task logs' serving*
 - Syslogs of tasks are created by the child jvm in hadoop.log.dir|userlogs/taskid as syslog
 - TaskTracker directly reads the tasks' logs and serves it via TaskLogServlet

In summary, the current directory structure follows. Unless otherwise stated, directories/files
are owned by TT but used by both TT and child.
{noformat}
taskTracker
    |- archive
    |- jobcache
        |- jobid
            |- work
            |- job.xml
            |- jars/job.jar
            |- taskid[.cleanup]
                |- work
                    |- job.xml / task-specific
                    |- taskjvm.sh ( created and used by TT)
                    |- output ( owned by child, used by TT)
                        |- all intermediate output files ( owned by child, used by TT)

mapred.child.tmp ( owned and used by child)

hadoop.log.dir|userlogs (owned and used by child)
    |- taskid       (owned and used by child)
        |- stdout   ( owned by child, used by TT)
        |- stderr   ( owned by child, used by TT)
        |- syslog  ( owned by child, used by TT)
{noformat}


      was (Author: vinodkv):
    I am summarizing the state-of-the-art local data management on TT

*Localization of task on the TT:*

 - Job localization
   -- happens once per job
   -- creates taskTracker/jobcache/jobid, tasktracker/jobcache/jobid/work directories recursively
   -- downloads job.xml to taskTracker/jobcache/jobid/job.xml and job jar to tasktracer/jobcache/jobid/jars/job.jar

 - Task localization
   -- happens once per task
   -- creates task's work directory recursively: taskTracker/jobcache/jobid/taskid[.cleanup]/work
   -- if needed, localizes archives and files for distributed cache to tasktracker/archive
and/or creates symlinks in the task's work directory, and rewrites job.xml
   -- creates mapred.child.tmp directory
   -- creates hadoop.log.dir|userlogs/taskid/ recursively and marks child to redirect its
stdout and stderr to this directory
   -- creates taskjvm.sh in case of linux task controller

*Intermediate output files*
 - All the intermediate files created by tasks run as user in taskTracker/jobcache/jobid/taskid[.cleanup]/output/
which is recursively created on demand by the child jvm.

*Intermediate output serving*
 - TaskTracker directly reads the map-intermediate output files from taskTracker/jobcache/jobid/taskid[.cleanup]/output/
and serves it to reduces via MapOutputServlet

*Task logs' serving*
 - Syslogs of tasks are created by the child jvm in hadoop.log.dir|userlogs/taskid as syslog
 - TaskTracker directly reads the tasks' logs and serves it via TaskLogServlet

In summary, the current directory structure follows. Unless otherwise stated, directories/files
are owned by TT but used by both TT and child.
{noformat}
taskTracker
    |- archive
    |- jobcache
        |- jobid
            |- work
            |- job.xml
            |- jars/job.jar
            |- taskid[.cleanup]
                |- work
                    |- taskjvm.sh ( created and used by TT)
                    |- output ( owned by child, used by TT)
                        |- all intermediate output files ( owned by child, used by TT)

mapred.child.tmp ( owned and used by child)

hadoop.log.dir|userlogs (owned and used by child)
    |- taskid       (owned and used by child)
        |- stdout   ( owned by child, used by TT)
        |- stderr   ( owned by child, used by TT)
        |- syslog  ( owned by child, used by TT)
{noformat}

  
> Per-job local data on the TaskTracker node should have right access-control
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-4491
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4491
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Vinod K V
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message