hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3245) Provide ability to persist running jobs (extend HADOOP-1876)
Date Tue, 08 Jul 2008 12:04:31 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611539#action_12611539
] 

Devaraj Das commented on HADOOP-3245:
-------------------------------------

Some initial comments:
1) Remove the unnecessary comments from JobTracker.java
2) Rename the "restarted" field as "recovering"
3) hasJobTrackerRestarted/Recovered API
4) Remove the comment: "//TODO wait for all the incomplete(previously running) jobs to be
ready" from offerService
5) Put back the call to completedJobStatusStore.store in finalizeJob
6) The method cleanupJob seems unnecessary. What is already done w.r.t cleanup will continue
to work.
7) The implementation of wasRecovered and hasRecovered should not make a back call to the
JobTracker
8) Synchronization for tasksInited in initTasks is redundant. Do a notify instead of notifyAll
in the following line.
9) In the interval between the JT death and restart the reducers might fail to fetch map outputs
from some tasktrackers (due to faulty map nodes, etc.), but it has no one to send the notifications
to. The reducers might end up killing themselves after a couple of retries.
10) The construction of TaskTrackerStatus should be reverted to how it was done earlier (cloneAndResetRunningTaskStatuses
called inline with the constructor invocation)
11) In TaskTracker.transmitHeartBeat you should call cloneAndResetRunningJobTaskStatuses rather
than cloneAndResetRunningTaskStatuses
12) Pls move the SYNC action handling to the offerService method
13) shouldResetEventsIndex could be cleared upon the first access as opposed to doing it in
the heartbeat processing
14) Instead of the additional RPC in Umbilical, you can add an arg in the getMapCompletionEvents
to know whether to reset or not
15) Factor out common code from cloneAndResetRunningJobTaskStatuses/cloneAndResetRunningTaskStatuses


> Provide ability to persist running jobs (extend HADOOP-1876)
> ------------------------------------------------------------
>
>                 Key: HADOOP-3245
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3245
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Amar Kamat
>         Attachments: HADOOP-3245-v2.5.patch, HADOOP-3245-v2.6.5.patch, HADOOP-3245-v2.6.9.patch
>
>
> This could probably extend the work done in HADOOP-1876. This feature can be applied
for things like jobs being able to survive jobtracker restarts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message