hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hitesh Sharma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5501) Container Pooling in YARN
Date Wed, 08 Feb 2017 21:28:41 GMT

    [ https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858564#comment-15858564

Hitesh Sharma commented on YARN-5501:

Hi [~jlowe],

First off all a big thanks for taking the time to look at the document and sharing your thoughts.
I appreciate it a lot.

bq. I am confused on how this will be used in practice.  To me pre-initialized containers
means containers that have already started up with application- or framework-specific resources
localized, processes have been launched using those resources, and potentially connections
already negotiated to external services.  I'm not sure how YARN is supposed to know what mix
of local resources, users, and configs to use for preinitialized containers that will get
a good "hit rate" on container requests.  Maybe I'm misunderstanding what is really meant
by "preinitialized," and some concrete, sample use cases with detailed walkthroughs of how
they work in practice would really help crystallize the goals here.

Your understanding of pre-initialized containers is correct here. In the proposed design YARN
RM has the config to start pre-initialized containers and this config is pretty much a launch
context, which contains launch commands, details of resources to localize, and we also provide
the resource constraints with which the container should be started. This configuration is
currently static, but in the future we intend to this to be pluggable, so we can extend it
to be dynamic and adjust based on cluster load.

The first use case happens to be a scenario where each of the container needs to start some
processes that take a lot of time to initialize (localization and startup costs). YARN NM
receives the config to start the pre-initialized container (there is a dummy application that
is associated with the pre-init container for a specific application) and it follows the regular
code paths for a container which includes localizing resources and launching the container.
As you know, in YARN a container goes to RUNNING state once started, but a pre-initialized
container instead goes to PREINITIALIZED state (there are some hooks which allow us to know
that the container has initialized properly). From this point the container is not different
from a regular container as the container monitor is overlooking it. The "Pool Manager" within
YARN NM is used to start the pre-initialized container and watches for container events like
stop in which case it simply tries to start it again. In other words at the moment we simply
use YARN RM to pick the nodes where pre-initialized container should be started and let the
"Pool Manager" in the NM manage the lifecycle of the container.

When the AM for which we pre-initialized the container comes and asks for this container then
the "Container Manager" takes the pre-initialized container by issuing a "detach" container
event and "attaches" it to the application. We added attachContainer and detachContainer events
into ContainerExecutor which allow us to define what they mean. As an example, in attachContainer
we start a new process within the cgroup of pre-initialized container. The PID to container
mapping within the ContainerExecutor is updated to reflect everything accordingly (pre-initialized
containers have a different container ID and belong to a different application before they
are taken up). As part of the detachContainer all the resources associated with the pre-initialized
container are now associated with the new container and get cleaned up accordingly.

The other use case where we have prototyped container pooling is the scenario where a container
actually needs to be a Virtual Machine. Creation of VMs can take a long time thus container
pooling allows us to keep the empty VM shells ready to go.

bq. Reusing containers across different applications is going to create some interesting scenarios
that don't exist today.  For example, what does a container ID for one of these looks like?
 How many things today assume that all container IDs for an application are essentially prefixed
by the application ID?  This would violate that assumption, unless we introduce some sort
of container ID aliasing where we create a "fake" container ID that maps to the "real" ID
of the reused container.  It would be good to know how we're going to treat container IDs
and what applications will see when they get one of these containers in response to their
allocation request.

All pre-initialized containers belong to a specific application type. There is a dummy application
created to which the pre-initialized container are mapped. As part of containerAttach and
containerDetach event we disassociate the containers between application. Specifically ContainerExecutor
has a mapping of container ID to PID file and as part of container detach we update this mapping.
For e.g. let's say the pre-init container had the ID container123 in that case the mapping
in executor would have container123=../container123.pidfile, but as part of container attach
and detach we update this mapping so that it now looks like newcontainer456=../container13.pidfile.
The ContainerExecutors use this mapping to locate the cgroup or Windows job object and thus
all container events are now issued on the pre-init container (i.e. the already started process).

bq. What happens for preinitialized container failures, both during application execution
and when idle?  Do we let the application launch its own recovery container, etc.

Eventually we want YARN RM should manage these things but for now in our PoC we have "Pool
Manager", which listens on these events, and simply keeps retrying. At the moment we only
use YARN RM to select nodes and pass config down.

bq. How does resource accounting/scheduling work with these containers?  Are they running
in a dedicated queue?  Can users go beyond their normal limits by getting these containers
outside of the app's queue?  Will it look weird when the user's queue isn't full yet we don't
allow them any more containers because they're already using the maximum number of preinitialized

We need to figure this part out. One of the thoughts we have had is that pre-init containers
can be considered opportunistic which means they can get killed in favor of other containers,
but if they do get used then they take the mantle of the new container.

bq. Could you elaborate on how the resizing works?  How do the processes running within the
container being resized made aware of the new size constraints?  Today containers don't communicate
with the NM directly, so I'm not sure how the preinitialized containers are supposed to know
they are suddenly half the size or can now leverage more memory than they could before.  Without
that communication channel it seems like we're either going to kill processes that overflowed
their lowered memory constraint or we're going to waste cluster resources because the processes
are still trying to fit within the old memory size.

Strictly speaking we haven't prototyped this part, but the idea is to reuse container allocation
increase mechanisms. For e.g. if the pre-init container was running with 2 core and 2GB then
after "attach" it could have it's resources increased to 4 core and 4GB. The resizing is simply
at the job object or cgroup level and we expect the application to have it's own communication
channel to talk with the processes that are started a priori.

bq. What are the security considerations?  Are preinitialized containers tied to a particular
user?  How are app-specific credentials conveyed to the preinitialized container, and can
credentials leak between apps that use the same container?

We haven't prototyped this part as much. Currently we start the pre-init containers by skipping
some of the security checks done in the "Container Manager". I think we can instead configure
the user with which pre-init containers should be started and then associate them with the
actual application.

bq. Is this some separate, new protocol for advertising or is this just simply reporting the
container is launched just like other container status today?  The RM already knows it sent
the NM a command to launch the container, so it seems this is just the NM reporting the state
of the container is now launched as it does for any other container start request today, but
I wasn't sure if that is what was meant here.

This is kind of reporting back to the RM that the pre-init container is ready. We have been
thinking of using [YARN-3926] to advertise the pre-init container as resources so they can
be requested by the AMs.

bq. I'm confused, I thought the preinitialized container is already launched, but this talks
about launching it after attach.  Again a concrete use-case walkthrough would help clarify
what's really being proposed.  If this is primarily about reducing localization time instead
of process startup after localization then there are simpler approaches we can take.

You are correct that the pre-init containers are already started. But in our scenarios a container
can have multiple processes running within the same cgroup or job object. With container pooling
we start some of these processes a priori and the remaining are started when the AM comes
around asking for containers. The containers for our application look for the processes started
a priori and communicate with them.

Again thanks a ton for the excellent feedback and look forward to discuss more. We can refine
some of the ideas here and make them more generic and useful to the community.

> Container Pooling in YARN
> -------------------------
>                 Key: YARN-5501
>                 URL: https://issues.apache.org/jira/browse/YARN-5501
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Arun Suresh
>            Assignee: Hitesh Sharma
>         Attachments: Container Pooling - one pager.pdf
> This JIRA proposes a method for reducing the container launch latency in YARN. It introduces
a notion of pooling *Unattached Pre-Initialized Containers*.
> Proposal in brief:
> * Have a *Pre-Initialized Container Factory* service within the NM to create these unattached
> * The NM would then advertise these containers as special resource types (this should
be possible via YARN-3926).
> * When a start container request is received by the node manager for launching a container
requesting this specific type of resource, it will take one of these unattached pre-initialized
containers from the pool, and use it to service the container request.
> * Once the request is complete, the pre-initialized container would be released and ready
to serve another request.
> This capability would help reduce container launch latencies and thereby allow for development
of more interactive applications on YARN.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message