hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3245) Provide ability to persist running jobs (extend HADOOP-1876)
Date Mon, 21 Jul 2008 05:25:31 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615151#action_12615151
] 

Owen O'Malley commented on HADOOP-3245:
---------------------------------------

If we are counting on the TaskTracker's reports to rebuild the state, we should have a safe-mode
equivalent where we wait for 2-3 minutes before launching new tasks, otherwise we will trash
the cluster as each new TaskTracker reports back. Please also make sure that the TaskTracker
does not reset and lose state if it gets an IOException when talking to the JobTracker.

However, rather than have TaskTracker's store additional information about the final task
status of each completed task in ram, I think we should reconsider the option of using the
JobHistory as a transaction log for each job. For storage on local disk, we probably should
support writing a second copy to NFS so that a different node could bring up the JobTracker.

In any case, the extra completed task state should be on disk rather than ram. We also need
to make sure that the JobHistory is complete and consistent even after the restoration.



> Provide ability to persist running jobs (extend HADOOP-1876)
> ------------------------------------------------------------
>
>                 Key: HADOOP-3245
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3245
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Amar Kamat
>         Attachments: HADOOP-3245-v2.5.patch, HADOOP-3245-v2.6.5.patch, HADOOP-3245-v2.6.9.patch,
HADOOP-3245-v4.1.patch
>
>
> This could probably extend the work done in HADOOP-1876. This feature can be applied
for things like jobs being able to survive jobtracker restarts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message