hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
Date Tue, 13 May 2014 17:42:15 GMT

    [ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996675#comment-13996675

Bikas Saha commented on YARN-2052:

The RM identifier is effectively the epoch for the RM. We already use it in the NM to differentiate
between allocations made by old RM vs the new RM. Using the appId in the container id prevents
us from using this epoch number since the appId cannot change across restarts for containers
belonging to the same app. That will be backwards incompatible.
Another alternative would be to replace the monotonically increasing sequence number with
a unique identifier like a UUID. But that is also incompatible.
Another alternative is to create another epoch number for the RM in addition to the cluster
timestamp. The monotonically increasing sequence could be a combination (concatenation) of
the new epoch number and the sequence number. e.g. container_XXX_1000 after epoch 1. When
the epoch number is 0 then we can drop the epoch number and things look the same as today.
e.g. container_XXX_000.

> ContainerId creation after work preserving restart is broken
> ------------------------------------------------------------
>                 Key: YARN-2052
>                 URL: https://issues.apache.org/jira/browse/YARN-2052
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Tsuyoshi OZAWA
> Container ids are made unique by using the app identifier and appending a monotonically
increasing sequence number to it. Since container creation is a high churn activity the RM
does not store the sequence number per app. So after restart it does not know what the new
sequence number should be for new allocations.

This message was sent by Atlassian JIRA

View raw message