hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-181) Secure job submission
Date Fri, 04 Sep 2009 12:10:57 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751438#action_12751438

Amar Kamat commented on MAPREDUCE-181:

Here is the final proposal :
# Here is how the handshake happens for job submission
 ## jobclient asks the jobtracker for a new jobid (jobtracker maintains a mapping from job-id
to user-name [ugi]. This user is the owner of the job and will be allowed to submit the job)
 ## using the Input-split, the jobclient constructs a split _meta-info_ for the jobtracker
to be able to create the task->node locality cache. 
   job-split-meta-info :
       - split-location (location of the actual split/raw-bytes)
       - split class (used to reinstantiate the split object)
       - split-info (array of individual split meta-info)

   split-meta-info :
       - locations (hostnames where this split is local)
       - start offset (start in raw-bytes)
       - length (total bytes in the corresponding raw-bytes)
       - data-size : total data that will be processed in this split
 ## with this new id, the jobclient upload job.xml, job.split, job.jar and achives/libs to
a staging area (/user/_user-name_/.staging/_jobid_/). job.xml is staged to support (jobtracker.getJobFile())
 ## after the upload is done, the jobclient submits a job by passing job-id, job-conf and
job-split-meta-info via rpc.
 ## jobtracker does the following things upon a submitjob request
  ### validate conf (includes queuecheck, acls checks etc along with user-name [conf.username
and owner match]and ownership checks [called of getnewid() and submitjob()])
  ### serialize conf to mapred.system.dir/jobid/job.xml (for restarts)
  ### serialize split-meta-info to mapred.system.dir/jobid/job.split
  ### starts the job i.e create jobinprogress
 ## when a tt comes asking for a task, the jobtracker passes the split-metainfo (along with
split-location and split-classname). Tasktracker uses this metainfo for reading the split
 ## tasktracker now localizes the job.jar from /user/_user-name_/.staging/_job-id_/job.jar
and then unjars it. This is done using the job-conf (having user-credentials)
 ## mapred.system.dir can now be 700 and only accessible to mapred daemons 
 ## readFields() in jobconf caps the total characters in jobconf. This prevents users from
passing huge job-confs. For now the limit is 3*1024*1024 chars
 ## job-split metainfo is also capped in readFields() to accept split meta-info < 10mb.
 ## since jobtracker.getNewJobId() maintains a mapping from jobid to username, the jobtracker
needs to cleanup this mapping upon some timeout. One way to timeout is to use a thread which
periodically cleans up this mapping.
 ## Upon job completion, jobcleanup code cleans up the staging folder i.e /user/_user-name_/.staging/_job-id_/.
 ## if the jobclient crashes or fails to submit job then the temp files /user/_user-name_/.staging/_job-id_/
are not deleted as this can be used for debugging purposes.

# Upon restart the mapred.system.dir can be completely trusted and hence no checking is done

> Secure job submission 
> ----------------------
>                 Key: MAPREDUCE-181
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>         Attachments: hadoop-3578-branch-20-example-2.patch, hadoop-3578-branch-20-example.patch,
HADOOP-3578-v2.6.patch, HADOOP-3578-v2.7.patch, MAPRED-181-v3.8.patch
> Currently the jobclient accesses the {{mapred.system.dir}} to add job details. Hence
the {{mapred.system.dir}} has the permissions of {{rwx-wx-wx}}. This could be a security loophole
where the job files might get overwritten/tampered after the job submission. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message