hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4758) Enable discovery of AMs by containers
Date Thu, 15 Sep 2016 00:35:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491898#comment-15491898

Junping Du commented on YARN-4758:

Put up a design doc as attached after discussed offline with Jian and Vinod. To mates who
are watching this JIRA, welcome comments.
In the meanwhile, start to do some POC patch which target to work with MAPREDUCE-6608 end
to end soon.

> Enable discovery of AMs by containers
> -------------------------------------
>                 Key: YARN-4758
>                 URL: https://issues.apache.org/jira/browse/YARN-4758
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Junping Du
>         Attachments: YARN-4758. AM Discovery Service for YARN Container.pdf
> {color:red}
> This is already discussed on the umbrella JIRA YARN-1489.
> Copying some of my condensed summary from the design doc (section of YARN-4692.
> {color}
> Even after the existing work in Work­preserving AM restart (Section 3.1.2 / YARN-1489),
we still haven’t solved the problem of old running containers not knowing where the new
AM starts running after the previous AM crashes. This is a specifically important problem
to be solved for long running services where we’d like to avoid killing service containers
when AMs fail­over. So far, we left this as a task for the apps, but solving it in YARN is
much desirable. [(Task) This looks very much like service­-registry (YARN-913), but for app­containers
to discover their own AMs.
> Combining this requirement (of any container being able to find their AM across fail­overs)
with those of services (to be able to find through DNS where a service container is running
- YARN-4757) will put our registry scalability needs to be much higher than that of just service
end­points. This calls for a more distributed solution for registry readers  something that
is discussed in the comments section of YARN-1489 and MAPREDUCE-6608.
> See comment https://issues.apache.org/jira/browse/YARN-1489?focusedCommentId=13862359&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13862359

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message