hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4670) Improve the way job history files are managed
Date Tue, 03 Mar 2009 09:44:57 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678270#action_12678270
] 

Amar Kamat commented on HADOOP-4670:
------------------------------------

I had an offline discussion with Devaraj, Hemanth and Sharad. Seems like the following structure
should solve this issue :
# old history files : path-to-job-history/
# history files for jobtracker on host hostname: path-to-job-history/hostname
# history files for user username using jobtracker running on hostname: path-to-job-history/hostname/username
# job history file format : <start-time>_<jobid>_<jobname>

Structuring it further on year, month and day might prove useful but for now it looks like
a premature step. If needed we can add it later. So users who submit job at very high rate
will be affected as compared to users that submit jobs less frequently. Searching will be
easier per-user.

Future steps :
1) Add date level info in structuring or atleast display
2) Add indexing info for faster access/display
3) Provide various view like recent ones, sort by day/week/month/year, jobname (sorting and
structuring) etc.
4) Secure access
5) Faster access and analysis (involves changes/tweaks to JobHistory and parsing).

Thoughts?

> Improve the way job history files are managed
> ---------------------------------------------
>
>                 Key: HADOOP-4670
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4670
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>
> Today all the jobhistory files are dumped in one _job-history_ folder. This can cause
problems when there is a need to search the history folder (job-recovery etc). It would be
nice if we group all the jobs under a _user_ folder. So all the jobs for user _amar_ will
go in _history-folder/amar/_. Jobs can be categorized using various features like _jobid,
date, jobname_ etc but using _username_ will make the search much more efficient and also
will not result into namespace explosion. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message