hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3245) Provide ability to persist running jobs (extend HADOOP-1876)
Date Tue, 27 May 2008 15:58:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12600166#action_12600166

Amar Kamat commented on HADOOP-3245:

Here is a proposal
1) A job can be one of the following states - _queued, running, completed_.
    Queued and running jobs needs to survive JT restarts and hence will be represented as
_queued_ and _running_ folders under backup directory. The reason we need to do this is basically
its cleaner to distinguish queued and running jobs. We need to see if we want this feature
to be configurable (in terms of enabling backup mode and specifying backup location) or always
check the _queued/running_ folders under *mapred.system.dir* to auto-detect if the JT got
restarted. For now this backup directory will be the *mapred.system.dir*.

2) On job submission, the job first gets queued up (goes in the job structures and also gets
persisted on the FS as queued). For now we will use a common structure to hold both the queued
and running jobs. In future (say after the new scheduler comes in) we might need to have separate
queues/lists for the same.

3) A job jumps from queued state to running state only when the {{JobInitThread}} selects
it for *initialization/running* [ consider initialized/expanded job is a running job]. As
of now all the jobs will transit from queued to running immediately. But in future the decision
of which job to initialize will be pretty complex/involved.

4)  Running jobs need following information for restarts :
   4.1) TIP info : What all TIPs are there in the job and what is their locality info. 
           This could be rebuilt from job.xml which is in *mapred.system.dir*. Hence on JT
restart, we should be careful 
           while clearing the mapred system directory. Currently the JobTracker will switch
to RESTART mode if there 
           is some stale data in the queued/running backup folders. If we decide to keep the
backup feature configurable then the
           JT will also check if its enabled.
   4.2) Completed task statuses : (Sameer's suggestion) TaskStatus info can be obtained from
the TTs. 
           More details are stated below (see SYNC ALGO)

5) _SYNC ALGO_ : Algo for sync up the JT and TTs :
   5.1)  On Demand Sync : 
     Have SYNC operation for the TaskTrackers. Following are the ways to achieve on-demand
       5.1.1) Greedy :
           a) TT sends an old heartbeat to a new restarted JT. The JT on restart check the
backup folders and detects
                if its in restart mode or not.
           b) Once the JT in restart mode receives a heartbeat which is not the *first* contact,
it considers that the 
               TT is from its previous incarnation and sends a SYNC command.
           c) TT receives a SYNC operation, adds the completed statuses of running jobs to
the current heartbeat 
               and SYNCs up with the JT making this contact as *initial*.
           d) JT receives the updated heartbeat as a new contact, updates the internal structures.
           e) JT replies with new tasks if asked.
     5.1.2) Waited :
           Similar to 6.1.1 but doesn't give out tasks immediately. Waits for some time and
then serves out the tasks. 
          The question to answer is how much to wait? How to detect that all the TTs have
     For 5.1.1, the rate at which the TTs SYNCs with the JT will be faster and hence the 
     overhead should not be much. Also we could avoid scheduling tasks on SYNC operation.
   5.2) Other options?

6) Problems/Issues : 
I) Once the JT restarts, the JIP structures for previously completed jobs will be missing.
Hence the web-ui will now change in terms of _completed_ jobs. Earlier the JT showed the completed
jobs which on restart it will not be able to. One work around is to use _completed-job-store_
to store completed jobs and serve completed jobs from job history on restarts.

II) Consider the following scenario :
   1. JT schedules task1, task2 to TT1
   2. JT schedules task3, task4 to TT2
   3. TT1 informs JT about task1/task2 completion
   4. JT restarts 
   5. JT receives SYNC from TT1.
   6. JT syncs up and schedules task3 to TT1
   7. TT1 starts task3 and this might interfere with the side effect files of task3 on TT2.
   8. In the worst case task3 could be running on TT1 and JT schedules task3 on TT1 in which
case the local folders will also 
    One way to overcome this is to include identifiers to distinguish between the task attempts
across JT restarts. We can use JT's timestamp as an identifier.

III) The logic for _detecting_ lost TT should not rely on missing data structures but use
some kind of book keeping.  We can now use 'missing data structures logic' for detecting when
the TT should SYNC. Note that detecting a TT as lost (missing TT details) if different from
declaring it as lost (10min gap in heartbeat).
So for now we should
1) Have backup as non configurable and use *mapred.system.dir* as the backup folder with _queued/running_
folders under it
2) Have queuing logic just for persistence 
3) Use job-history for serving completed jobs upon restarts
4) Change lost TT _detection_ logic
5) Use _On-Demand:Greedy_ sync logic 
6) Task attempts carry encoded JT timestamp with them

> Provide ability to persist running jobs (extend HADOOP-1876)
> ------------------------------------------------------------
>                 Key: HADOOP-3245
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3245
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Amar Kamat
>             Fix For: 0.18.0
> This could probably extend the work done in HADOOP-1876. This feature can be applied
for things like jobs being able to survive jobtracker restarts.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message