hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-323) Improve the way job history files are managed
Date Tue, 03 Nov 2009 19:01:32 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773118#action_12773118

Doug Cutting commented on MAPREDUCE-323:

> Yes. I agree that breaking by job-ids make things a lot simpler.

Great!  Does anyone else have concerns with that approach?

> admins will have to create per-user directories in history folder where JobTracker can
write to

Another possibility might be that jobs can specify a group permitted to read its logs.  The
jobtracker would chgrp the logs to that group.  The jobtracker's uid would need to be a member
of that group.  The difference is that, rather than having to configure each filesystem for
each user, one can just configure the user/groups database.  Another difference is that this
would permit logs to be readable by more than the single user who submitted the job.  But
this is all stuff for later...

> Improve the way job history files are managed
> ---------------------------------------------
>                 Key: MAPREDUCE-323
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-323
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Amar Kamat
>            Assignee: Amareshwari Sriramadasu
>            Priority: Critical
> Today all the jobhistory files are dumped in one _job-history_ folder. This can cause
problems when there is a need to search the history folder (job-recovery etc). It would be
nice if we group all the jobs under a _user_ folder. So all the jobs for user _amar_ will
go in _history-folder/amar/_. Jobs can be categorized using various features like _jobid,
date, jobname_ etc but using _username_ will make the search much more efficient and also
will not result into namespace explosion. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message