hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5621) Support LinuxContainerExecutor to create symlinks for continuously localized resources
Date Tue, 27 Sep 2016 00:23:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15524603#comment-15524603
] 

Chris Douglas commented on YARN-5621:
-------------------------------------

That summary of work seems about right, thanks for putting it together.

You raise excellent points about error handling. Your sketch includes a channel communicating
which resources were (un)successfully linked. The script-driven approach handles this in v05
by writing a separate bash script and invoking the CE for each symlink (which, to be fair,
isn't exactly "lightweight" when compared to extending {{ContainerLocalizer}}). In v05, a
failure affects only one resource, but to take your earlier example linking a batch of resources
in the script: how would one handle partial failures? What's the state of the container and
resources when the script invocation fails?

On the CL proposal: either the CI initiates the symlink request to the {{ResourceLocalizationService}}
after download, or the two operations are contained within that service. The complexity is
comparable. The 2-phase protocol you sketch (CI initiates download, then link) adds a gap
when the CL could be shut down before it receives the {{LINK}} commands (causing two CL launches),
but even a short timeout would likely cover that.

A single-message annotating the resource (download+symlink) could add states to {{LocalizedResource}}
if it were to notify starting containers directly (current code) or handoff to the RLS for
symlink. In this case, the protocol to the {{ContainerImpl}} is simpler (resending/retry is
idempotent b/c it doesn't care if the download or symlink failed). Both {{FetchSuccessTransition}}
and {{LocalizedResourceTransition}} would need to send {{LocalizerResourceRequestEvent}} for
running containers to symlink. A failed symlink would look like a failed download to the CI.
Start container is unaffected.

For the CL itself... sure, {{ResourceLocalizationSpec}} needs an another field for symlinks.
This side is pretty straightforward, right?

> Support LinuxContainerExecutor to create symlinks for continuously localized resources
> --------------------------------------------------------------------------------------
>
>                 Key: YARN-5621
>                 URL: https://issues.apache.org/jira/browse/YARN-5621
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch, YARN-5621.4.patch,
YARN-5621.5.patch
>
>
> When new resources are localized, new symlink needs to be created for the localized resource.
This is the change for the LinuxContainerExecutor to create the symlinks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message