hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5621) Support LinuxContainerExecutor to create symlinks for continuously localized resources
Date Tue, 27 Sep 2016 00:23:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15524603#comment-15524603

Chris Douglas commented on YARN-5621:

That summary of work seems about right, thanks for putting it together.

You raise excellent points about error handling. Your sketch includes a channel communicating
which resources were (un)successfully linked. The script-driven approach handles this in v05
by writing a separate bash script and invoking the CE for each symlink (which, to be fair,
isn't exactly "lightweight" when compared to extending {{ContainerLocalizer}}). In v05, a
failure affects only one resource, but to take your earlier example linking a batch of resources
in the script: how would one handle partial failures? What's the state of the container and
resources when the script invocation fails?

On the CL proposal: either the CI initiates the symlink request to the {{ResourceLocalizationService}}
after download, or the two operations are contained within that service. The complexity is
comparable. The 2-phase protocol you sketch (CI initiates download, then link) adds a gap
when the CL could be shut down before it receives the {{LINK}} commands (causing two CL launches),
but even a short timeout would likely cover that.

A single-message annotating the resource (download+symlink) could add states to {{LocalizedResource}}
if it were to notify starting containers directly (current code) or handoff to the RLS for
symlink. In this case, the protocol to the {{ContainerImpl}} is simpler (resending/retry is
idempotent b/c it doesn't care if the download or symlink failed). Both {{FetchSuccessTransition}}
and {{LocalizedResourceTransition}} would need to send {{LocalizerResourceRequestEvent}} for
running containers to symlink. A failed symlink would look like a failed download to the CI.
Start container is unaffected.

For the CL itself... sure, {{ResourceLocalizationSpec}} needs an another field for symlinks.
This side is pretty straightforward, right?

> Support LinuxContainerExecutor to create symlinks for continuously localized resources
> --------------------------------------------------------------------------------------
>                 Key: YARN-5621
>                 URL: https://issues.apache.org/jira/browse/YARN-5621
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch, YARN-5621.4.patch,
> When new resources are localized, new symlink needs to be created for the localized resource.
This is the change for the LinuxContainerExecutor to create the symlinks.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message