hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsuyoshi OZAWA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
Date Fri, 13 Jun 2014 21:49:02 GMT

    [ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031223#comment-14031223

Tsuyoshi OZAWA commented on YARN-2052:

[~jianhe] and [~vinodkv], thank you for the comments and suggestions!

This scheme won't work with a single reserved digit for epochs and a large number of restarts
over time.

Yes, this is a case that integer overflow happens. We need to take it into account the case.

Old code (state-store, history-server etc) will not read it and that's fine. The only problem
is users who are interpreting container_ID strings themselves. That is NOT supported. We should
modify ConverterUtils to support the new-field, and that should do.

Adding RM Id + hostname as epoch sounds reasonable approach to me. If we suffixes the epoch
to the container id, following code is also valid with old {{ConverterUtils.toContainerId}}:

    ContainerId id = TestContainerId.newContainerId(0, 0, 0, 0);
    String cid = ConverterUtils.toString(id);
    ContainerId gen = ConverterUtils.toContainerId(cid + "_uuid_rm1");
    assertEquals(gen, id); // valid to parse even with old code

Therefore, I think {{container_XXX_000_uuid_rm1}} is better format. I'll create a patch based
on the idea.

> ContainerId creation after work preserving restart is broken
> ------------------------------------------------------------
>                 Key: YARN-2052
>                 URL: https://issues.apache.org/jira/browse/YARN-2052
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Tsuyoshi OZAWA
>            Assignee: Tsuyoshi OZAWA
> Container ids are made unique by using the app identifier and appending a monotonically
increasing sequence number to it. Since container creation is a high churn activity the RM
does not store the sequence number per app. So after restart it does not know what the new
sequence number should be for new allocations.

This message was sent by Atlassian JIRA

View raw message