hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dick King (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-323) Improve the way job history files are managed
Date Wed, 28 Jul 2010 19:12:21 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893316#action_12893316

Dick King commented on MAPREDUCE-323:

Here are the new APIs I propose:

All are static public member functions of {{JobHistory}} .

All methods return only items from the done directory.  Techniques for 


    Path getJobHistoryPath(JobID id) throws IOException

    Path jobPathToConfPath(Path jobPath) throws IOException 
      // works in memory at computer speed.  Pledges to not read the file.
      // for a syntactically legal Path that doesn't correspond to an actual
      // job, can either return the corresponding conf Path that also won't
      // exist, or throw an exception.

    Iterator<Path> getMatchingJob
             (String user, String jobnameSubstring, String[] dateStrings)
          throws IOException
      // has no remove() method
      // any criterion can be null
      // filtering is conjunctive
      // dates are MM/DD/YYYY 
      // results happen in an arbitrary order
      // a new file that gets added after the iterator is created can either be
      //   or not be delivered by the result
      // dates are approximations of completion time


> Improve the way job history files are managed
> ---------------------------------------------
>                 Key: MAPREDUCE-323
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-323
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Amar Kamat
>            Assignee: Dick King
>            Priority: Critical
> Today all the jobhistory files are dumped in one _job-history_ folder. This can cause
problems when there is a need to search the history folder (job-recovery etc). It would be
nice if we group all the jobs under a _user_ folder. So all the jobs for user _amar_ will
go in _history-folder/amar/_. Jobs can be categorized using various features like _jobid,
date, jobname_ etc but using _username_ will make the search much more efficient and also
will not result into namespace explosion. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message