hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsuyoshi OZAWA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
Date Thu, 12 Jun 2014 18:15:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029539#comment-14029539
] 

Tsuyoshi OZAWA commented on YARN-2052:
--------------------------------------

[~jianhe], I think it's OK after fencing operation, but one problem is {{recover()}} is invoked
before the fencing. My idea to deal with the problem is as follows:

1. Active RM stores current epoch value.
2. After the fail over, new active RM recovers epoch and recognizes the epoch value as {{epoch
+ 1}}.
3. New active RM issues {{fence()}} on ZKRMStateStore and increment epoch.

> ContainerId creation after work preserving restart is broken
> ------------------------------------------------------------
>
>                 Key: YARN-2052
>                 URL: https://issues.apache.org/jira/browse/YARN-2052
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Tsuyoshi OZAWA
>
> Container ids are made unique by using the app identifier and appending a monotonically
increasing sequence number to it. Since container creation is a high churn activity the RM
does not store the sequence number per app. So after restart it does not know what the new
sequence number should be for new allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message