hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3578) mapred.system.dir should be accessible only to hadoop daemons
Date Tue, 07 Apr 2009 03:35:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696382#action_12696382
] 

Amar Kamat commented on HADOOP-3578:
------------------------------------

Here is the proposal :

_Terms :_
# mapred.system.dir : the common location where the users (jobclient) uploads job files (job
split and job jars). This dir will have rwx-w--w- permissions.
# mapred.system.dir/jobtracker : jobtracker's private scratch space with rwx------ permissions.
This is the place where the jobtracker moves files upon successful job submission (upload
+ validation).

The process of job submission is as follows
# jobclient/user asks jobtracker for a new jobid
# jobclient generates a new x digit random number and upload the job files (split and jar)
to mapred.system.dir/jobid-random-number
# jobclient/user pass this information and the jobconf to the jobtracker via the rpc (submitJob
api). 
# jobtracker loads the conf via the rpc, does the acls check and only then the job is *accepted*
(moved to mapred.system.dir/jobtracker)
# jobtracker serializes the job.xml (changing the location of split and jar file info in the
conf)  to mapred.system.dir/jobtracker/jobid, moves job.jar and job.split to mapred.system.dir/jobtracker/jobid
(this is imp for tasktracker rely on the information in the conf for job.jar and job.split).

# Upon restart all the jobs that are present in mapred.system.dir/jobtracker/ will be blindly
loaded and jobs in mapred.system.dir/ will be queued for cleanup.

_Benefits :_
# guessing job-dir will be hard as random number will be appended 
# separation between faulty jobs (jobs failing on access etc) and accepted jobs will be clear
(helps in recovery)
# jobtracker system dir will be clean and cannot be garbled 
# jobconf need not be read from fs as it wil be passed via rpc, this helps in making quick
decisions whether the job is faulty or not
# re-initing jobtracker is as simple as deleting jobtracker's system.dir (mapred.system.dir/jobtracker)
without touching the mapred.system.dir

_Questions :_
# Should default api assume that the job.xml, job.jar and job.xml are still present in mapred.system.dir/jobid?

----
Thoughts? Comments?

> mapred.system.dir should be accessible only to hadoop daemons 
> --------------------------------------------------------------
>
>                 Key: HADOOP-3578
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3578
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>
> Currently the jobclient accesses the {{mapred.system.dir}} to add job details. Hence
the {{mapred.system.dir}} has the permissions of {{rwx-wx-wx}}. This could be a security loophole
where the job files might get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message