hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dick King (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-323) Improve the way job history files are managed
Date Fri, 28 May 2010 22:02:39 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873160#action_12873160

Dick King commented on MAPREDUCE-323:


1: I will have to fix rumen to recursively descend into a directory of directories to make
it capable of swallowing a history directory.

1a: I would like to still process the job IDs in lexicographical order [which is almost always
chronological order] for compatibility with applications that expect approximately chronological

1b: This creates a memory footprint of about 200b/entry, which may impose a limit of one million
jobs or so.

2: I will make the directories configurable.  How about the following controls?

  |{{%y}}    |year [four digits] [The Y10K problem will be someone else's problem :-) ]|
  |{{%m}}   |month [two digits, leading zeros present]|
  |{{%d}}    |day [two digits, leading zeros present]|
  |{{%h}}    |hour [two digits, leading zeros present]|
  |{{%i}}     |mInute [two digits, leading zeros present]|
  |{{%u}}    |user|
  |{{%xi-j}}  |the digits from the jobID index whose positions run from {{i}} through {{j}},
_downwards_, numbered _from the right, 1-based_.  If you choose any digits that don't exist
you get no characters in the output for those digits.  {{%x9-3}} will give you directories
holding logs for at most 100 jobs, unless you omit timestamp selection controls.|
  |{{/}}         |directory component separator [even on platforms with a different separator
character] -- if there are two or more slashes in a row we swallow all but one, and note that
there's an implicit leading and trailing separator character|
  |any other character   |itself|

Did I leave anything out?

> Improve the way job history files are managed
> ---------------------------------------------
>                 Key: MAPREDUCE-323
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-323
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Amar Kamat
>            Assignee: Dick King
>            Priority: Critical
> Today all the jobhistory files are dumped in one _job-history_ folder. This can cause
problems when there is a need to search the history folder (job-recovery etc). It would be
nice if we group all the jobs under a _user_ folder. So all the jobs for user _amar_ will
go in _history-folder/amar/_. Jobs can be categorized using various features like _jobid,
date, jobname_ etc but using _username_ will make the search much more efficient and also
will not result into namespace explosion. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message