hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hitesh Sharma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5501) Container Pooling in YARN
Date Thu, 09 Feb 2017 03:06:41 GMT

    [ https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858926#comment-15858926
] 

Hitesh Sharma commented on YARN-5501:
-------------------------------------

[~jlowe], thanks for the great feedback and time taken to respond.

Some more details on how attach and detach container actually work.

PoolManager creates the pre-initialized containers and they are not different from regular
containers in any real way. When ContainerManager receives a startContainer request then it
issues a DETACH_CONTAINER event. The detach really exists to ensure that we can cleanup the
state associated with the pre-init container but avoid cleaning up the resources localized.
ContainerManager listens for CONTAINER_DETACHED event and once it receives that then it creates
the ContainerImpl for the requested container, but passes the information related to the detached
container as the ContainerImpl c'tor. The ContainerManager also follows through the regular
code paths of starting the container, which means that resource localization happens for the
new container, and when it comes to raising the launch event then the ContainerImpl instead
raises the ATTACH_CONTAINER event. This allows the ContainersLauncher to call the attachContainer
on the executor, which is where we make the choice of launching the other processes required
for that container. I hope this helps clarify things a little bit more.

bq. I'm thinking of a use-case where the container is a base set that applies to all instances
of an app framework, but each app may need a few extra things localized to do an app-specific
thing (think UDFs for Hive/Pig, etc.). Curious if that is planned and how to deal with the
lifecycle of those "extra" per-app things.

Yes, the base set of things applies to all instances of the app framework. But localization
is still done for each instance so you can for e.g. download a set of binaries via pre-initialization
but more job specific things can come later.

bq. So it sounds like there is a new container ID generated in the application's container
namespace as part of the "allocation" to fill the app's request, but this container ID is
aliased to an already existing container ID in another application's namespace, not only at
the container executor level but all the way up to the container ID seen at the app level,
correct?

The application gets a container ID from YARN RM and uses that for all purposes. On the NM
we internally switch to use the pre-init container ID as the PID. For e.g. pre-init container
had the ID container1234 while the AM requested container had the ID containerABCD. Even though
we reuse the existing pre-init container1234 to service the start container request on the
NM we never surface container1234 to the application and the app always sees containerABCD.

bq. One idea is to treat these things like the page cache in Linux. In other words, we keep
a cache of idle containers as apps run them. These containers, like page cache entries, will
be quickly discarded if they are unused and we need to make room for other containers. We're
simply caching successful containers that have been run on the cluster, ready to run another
task just like it. Apps would still need to make some tweaks to their container code so it
talks the yet-to-be-detailed-and-mysterious attach/detach protocol so they can participate
in this automatic container cache, and there would need to be changes in how containers are
requested so the RM can properly match a request to an existing container (something that
already has to be done for any reuse approach). Seems like it would adapt well to shifting
loads on the cluster and doesn't require a premeditated, static config by users to get their
app load to benefit. Has something like that been considered?

That is a very interesting idea. If the app can provide some hints as to when it is good to
consider a container pre-initialized then when the container finishes we can carry out the
required operations to go back to the pre-init state. Thanks for bringing this up.

bq. I think that's going to be challenging for the apps in practice and will limit which apps
can leverage this feature reliably. This is going to be challenging for containers runniing
VMs whose memory limits need to be setup at startup (e.g.: JVMs). Minimally I think this feature
needs a way for apps to specify that they do not have a way to communicate (or at least act
upon) memory changes. In those cases YARN will have to decide on tradeoffs like a primed-but-oversized
container that will run fast but waste grid resources and also avoid reusing a container that
needs to grow to satisfy the app 
request.

Hmm..let me look at the code and see how container resizing works today. What you are saying
makes sense, but in that case container resizing won't work as well. For our scenarios resource
constraints are enforced via job objects or cgroups so things are ok.

bq. Also the container is already talking this yet-to-be-detailed attach/detach protocol,
so I would expect any memory change request to also arrive via that communication channel.
Why isn't that the case?

I gave some details of how attach/detach works and it is not really a protocol but state machine
changes to ensure we update the YARN machinery accordingly.

bq. Making sure we don't mix users is the most basic step, but there's still the issue of
credentials. There needs to be a way to convey app-specific credentials to these containers
and make sure they don't leak between apps. The security design should be addressed sooner
rather than later, because it's going to be difficult to patch it in after the fact.

I agree we need to do more thinking here. Let me get back on this.


bq. It sounds like you already have a working PoC and scenarios for it. These would be great
to detail via flow/message sequence diagrams detailing the operation order for container init,
attach, detach, restart, etc. It would also be great to detail what changes apps using this
feature will see over what they do today (i.e.: if there's something changing re: container
IDs, container killing, etc.) and what changes are required on their part in order to participate.

In practice we are using container pooling as a pure optimization. As I mentioned earlier
one of our use cases involves starting some heavy processes, waiting for them to start, and
then do the actual work by launching other processes within the same cgroup or job object.
With pooling our AM requests for a pre-init container and since we have static config on how
many pre-init containers are running it may or may not receive one. In the container launch
command we check for the presence of these heavy processes and if they are found we skip initializing
them, which saves quite some time. 

Please note that this is a PoC to put some of the ideas into practice. We are eager to see
how this can be evolved into something more generic that is useful to the community. I will
be happy to share some details and maybe post a WIP patch to give some clarity.

Open to ideas and suggestions.

> Container Pooling in YARN
> -------------------------
>
>                 Key: YARN-5501
>                 URL: https://issues.apache.org/jira/browse/YARN-5501
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Arun Suresh
>            Assignee: Hitesh Sharma
>         Attachments: Container Pooling - one pager.pdf
>
>
> This JIRA proposes a method for reducing the container launch latency in YARN. It introduces
a notion of pooling *Unattached Pre-Initialized Containers*.
> Proposal in brief:
> * Have a *Pre-Initialized Container Factory* service within the NM to create these unattached
containers.
> * The NM would then advertise these containers as special resource types (this should
be possible via YARN-3926).
> * When a start container request is received by the node manager for launching a container
requesting this specific type of resource, it will take one of these unattached pre-initialized
containers from the pool, and use it to service the container request.
> * Once the request is complete, the pre-initialized container would be released and ready
to serve another request.
> This capability would help reduce container launch latencies and thereby allow for development
of more interactive applications on YARN.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message