hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3245) Provide ability to persist running jobs (extend HADOOP-1876)
Date Mon, 08 Sep 2008 06:30:46 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629080#action_12629080

Amar Kamat commented on HADOOP-3245:

One comment on the patch. 

_Approach :_

The way history renaming is done in this patch is as follows
   - given the job-id, job-name and the user-name, try to find out a file from the history
folder that matches the pattern : jt-hostname_[0-9]*_jobid_jobname_username
   - if any file matches the pattern, say file _f_, then use _f.recover_ as the new file for
history.  If the file _f.recover_ is recovered, then rename _f.recover_ to _f_ and use _f.recover_
as the new file for history.
   - On successful recovery, delete _f_
   - On job completion, rename _f.recover_ to _f_.
   - If the jt restarts in between, use the older file as the file for recovery.

_Problem :_

With trunk, only 1 dfs access is made while starting the log process for a job. With this
patch there will be 4 dfs accesses 
   - Check if the job has a history _file_ _[ false for new jobs]_
   - Check if _file_ exists _[false for new jobs]_
   - Check if _file.recovery_ exists _[false for new jobs]_
   - Open _file_ for logging

I think it makes more sense to create a new job file upon every restart. Before starting the
recovery process,  delete all the history files related to the job except the oldest file.
Note that the history filename has timestamp in it so that detecting the oldest file will
now easy.

_Example :_
Say that the job started with the timestamp t1. The job history filename would be _hostname_t1_jobid_jobname_username_.
Upon restart, delete all the file related to job except the oldest file. Now new filename
would be _hostname_t2_jobid_jobname_username_. Use _hostname_t1_jobid_jobname_username_ as
the source for recovery. If the jobtracker dies while recovering then there will be 2 history
file for the job, delete _hostname_t2_jobid_jobname_username_ upon recovery and use _hostname_t1_jobid_jobname_username_
for recovery. If the recovery is successful, delete _hostname_t1_jobid_jobname_username_ just
to make sure that the latest history file will be used upon next restart. There is no renaming
and no temp file involved in this approach. 

Note that at a given time there will be at the max 2 history files per job.

> Provide ability to persist running jobs (extend HADOOP-1876)
> ------------------------------------------------------------
>                 Key: HADOOP-3245
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3245
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Amar Kamat
>         Attachments: HADOOP-3245-v2.5.patch, HADOOP-3245-v2.6.5.patch, HADOOP-3245-v2.6.9.patch,
HADOOP-3245-v4.1.patch, HADOOP-3245-v5.13.patch, HADOOP-3245-v5.14.patch, HADOOP-3245-v5.26.patch,
HADOOP-3245-v5.30-nolog.patch, HADOOP-3245-v5.31.3-nolog.patch, HADOOP-3245-v5.33.1.patch,
> This could probably extend the work done in HADOOP-1876. This feature can be applied
for things like jobs being able to survive jobtracker restarts.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message