hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-181) mapred.system.dir should be accessible only to hadoop daemons
Date Tue, 18 Aug 2009 11:44:15 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744467#action_12744467
] 

Devaraj Das commented on MAPREDUCE-181:
---------------------------------------

I wonder whether it makes sense to have the jobclient write two files per a split file:

1) the splits info (the actual bytes) written to a secure location on the hdfs (with permissions
700)
2) the split metadata, which is a set of entries like {<map-id>:<location_1><location_2>..<location_n>,
<start-offset-in-split-file><length>} for each map-id. This is serialized over
RPC, and the JobTracker writes it to the well known mapred-system-directory (which the JobTracker
owns with perms 700).

The JobTracker just reads/loads the metadata, and creates the TIP cache.

The TaskTracker is handed off a split object that looks something like {<start-offset-in-split-file><length>}.
As part of task localization, the TT copies the specific bytes from the split file (securely),
and launches the task that then reads the split or the TT could simply stream it over RPC
to the child. The replication factor could be set to a high number for the splits info file..


Doing it in this way should reduce the size of the split file information considerably (and
we can have a cap on the metadata size as well), and also provide security for the user generated
split files' content.

For the JobConf, passing the basic and the minimum info to the JobTracker as Hong suggested
on MAPREDUCE-841 seems to make sense. For all other conf properties, the Task can load them
directly from the HDFS. The max size (in terms of #bytes) of the basic information could be
easily derived and we could have a cap on that for the RPC communication.

Thoughts?

> mapred.system.dir should be accessible only to hadoop daemons 
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-181
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>         Attachments: hadoop-3578-branch-20-example-2.patch, hadoop-3578-branch-20-example.patch,
HADOOP-3578-v2.6.patch, HADOOP-3578-v2.7.patch
>
>
> Currently the jobclient accesses the {{mapred.system.dir}} to add job details. Hence
the {{mapred.system.dir}} has the permissions of {{rwx-wx-wx}}. This could be a security loophole
where the job files might get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message