hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart
Date Mon, 12 May 2014 07:37:17 GMT

    [ https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994886#comment-13994886

Vinod Kumar Vavilapalli commented on YARN-556:

Tx for the community update, Karthik.

Also, Jian/Abhinav, can you both please file all the known sub-tasks and assign things to
yourselves according as you are working on them rightaway? Other folks like [~ozawa] and [~rohithsharma]
have been requesting repeatedly expressed interest to work on this feature. It'll be great
to find stuff for everyone instead of creating all tickets and assigning them to the two of
you. Thanks.

[~ozawa] and [~rohithsharma], let others know what you specifically want to work on, if you
have something in mind.

bq.  6. clustertimestamp is added to containerId so that containerId after RM restart do not
clash with containerId before (as the containerId counter resets to zero in memory)
I totally missed this line item. Can you throw more detail on what the problem is and what
the proposal is? What is done in the prototype patch is a major compatibility issue - I'd
like to avoid it if we can.

> RM Restart phase 2 - Work preserving restart
> --------------------------------------------
>                 Key: YARN-556
>                 URL: https://issues.apache.org/jira/browse/YARN-556
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: Work Preserving RM Restart.pdf, WorkPreservingRestartPrototype.001.patch
> YARN-128 covered storing the state needed for the RM to recover critical information.
This umbrella jira will track changes needed to recover the running state of the cluster so
that work can be preserved across RM restarts.

This message was sent by Atlassian JIRA

View raw message