Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Wed, 18 Jun 2014 02:58:03 +0000 (UTC)
From: "Tsuyoshi OZAWA (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12713893.1399986079908.151289.1403060283703@arcas>
In-Reply-To: <JIRA.12713893.1399986079908@arcas>
References: <JIRA.12713893.1399986079908@arcas>
Subject: [jira] [Commented] (YARN-2052) ContainerId creation after work
 preserving restart is broken
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034746#comment-14034746 ] 

Tsuyoshi OZAWA commented on YARN-2052:
--------------------------------------

{quote}
We should make it a long in the same release as the epoch number addition so that we dont have to worry about that.
{quote}

+1 to do this in the same release. We'll plan to do the improvement on another JIRA. It's OK, but I think it's important for us that we decide the behavior when the overflow happens. We have 2 options: just aborting RM for now or starting apps from a clean state after the restart. We're planning to make id long just after this JIRA, so we can take aborting approach to prevent unexpected behavior for the simplicity. [~bikassaha], [~jianhe], what do you think about this?

> ContainerId creation after work preserving restart is broken
> ------------------------------------------------------------
>
>                 Key: YARN-2052
>                 URL: https://issues.apache.org/jira/browse/YARN-2052
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Tsuyoshi OZAWA
>            Assignee: Tsuyoshi OZAWA
>         Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch
>
>
> Container ids are made unique by using the app identifier and appending a monotonically increasing sequence number to it. Since container creation is a high churn activity the RM does not store the sequence number per app. So after restart it does not know what the new sequence number should be for new allocations.


--
This message was sent by Atlassian JIRA
(v6.2#6252)