hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-828) Provide a mechanism to pause the jobtracker
Date Thu, 06 Aug 2009 06:29:14 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739912#action_12739912
] 

Hemanth Yamijala commented on MAPREDUCE-828:
--------------------------------------------

Broadly the requirements for the feature seem to be:
- Provide a command to pause/resume the JT.
- Pause command will pause all scheduling - no new tasks will be launched, no further action
would be taken by the tasktrackers, in order to minimize the chance of failures.
- Currently running tasks may continue to finish or fail. Failed tasks will not be re-launched
until the JT is resumed and hence there is a very reduced chance that jobs will be failed
because of task failures.
- Pause command will stop initialization of jobs. Jobs being initialized at the same instant
may fail. We are not planning to handle this case in the interest of simplicity.
- Jobs submitted to the JT will be queued up. However, if the job client fails to write the
job files to the DFS (the step before job submission), those jobs will be lost.


> Provide a mechanism to pause the jobtracker
> -------------------------------------------
>
>                 Key: MAPREDUCE-828
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-828
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: jobtracker
>            Reporter: Hemanth Yamijala
>
> We've seen scenarios when we have needed to stop the namenode for a maintenance activity.
In such scenarios, if the jobtracker (JT) continues to run, jobs would fail due to initialization
or task failures (due to DFS). We could restart the JT enabling job recovery, during such
scenarios. But restart has proved to be a very intrusive activity, particularly if the JT
is not at fault itself and does not require a restart. The ask is for a admin-controlled feature
to pause the JT which would take it to a state somewhat analogous to the safe mode of DFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message