hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-323) Improve the way job history files are managed
Date Tue, 03 Nov 2009 06:09:59 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772886#action_12772886
] 

Amareshwari Sriramadasu commented on MAPREDUCE-323:
---------------------------------------------------

bq. But, as I stated later, I think breaking directories by job-id makes lookup simpler and
gives us more explicit limits over directory sizes. So I'd prefer that to time-based directories.
Yes. I agree that breaking by job-ids make things a lot simpler.

bq. So, if we have all job history files in a single tree, then we'd want the directories
in that tree to be world readable, but the log files to be owned and readable by the job's
submitter.
To achieve this, JobTracker would need super user privileges on HDFS, to do chown.  If we
assume JT would have super-user privileges on HDFS, then we can go with job-id based directory
structure.

bq. Or, if we have per-user directories, we could make those readable only by that user, providing
greater privacy. Is this what you mean?
Yes. I meant this. In this case, admins will have to create per-user directories in history
folder where JobTracker can write to. JT will not need super-user privileges here.

> Improve the way job history files are managed
> ---------------------------------------------
>
>                 Key: MAPREDUCE-323
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-323
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Amar Kamat
>            Assignee: Amareshwari Sriramadasu
>            Priority: Critical
>
> Today all the jobhistory files are dumped in one _job-history_ folder. This can cause
problems when there is a need to search the history folder (job-recovery etc). It would be
nice if we group all the jobs under a _user_ folder. So all the jobs for user _amar_ will
go in _history-folder/amar/_. Jobs can be categorized using various features like _jobid,
date, jobname_ etc but using _username_ will make the search much more efficient and also
will not result into namespace explosion. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message