hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling
Date Wed, 11 Dec 2013 02:40:08 GMT

    [ https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845011#comment-13845011
] 

Vinod Kumar Vavilapalli commented on YARN-1404:
-----------------------------------------------

bq. I'm not sure I entirely understand what you mean by create a new level of trust.
I thought that was already clear to everyone. See my comment [here|https://issues.apache.org/jira/browse/YARN-1404?focusedCommentId=13840905&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13840905].
"YARN depends on the ability to enforce resource-usage restrictions".

YARN enables both resource scheduling and enforcement of those scheduling decisions. If resources
sit outside of YARN, YARN cannot enforce the limits on their usage. For e.g, YARN cannot enforce
the memory usage of a datanode. People may work around it by setting up Cgroups on these daemons,
but that defeats the purpose of YARN in the first place. That is why I earlier proposed that
impala/datanode run under YARN. When I couldn't find a solution otherwise, I revised my proposal
to restrict it to be used with a special ACL so that other apps don't abuse the cluster by
requesting unmanaged containers and not using those resources.

bq. It depends on that or the AM releasing the resources. Process liveliness is a very imperfect
signifier ...
We cannot trust AMs to always release containers. If it were so imperfect, we should change
YARN as it is today to not depend on liveliness. I'd leave it as an exercise to see how, once
we remove process-liveliness in general, apps will release containers and how clusters get
utilized. Bonus points for trying it on a shared multi-tenant cluster with user-written YARN
apps.

My point is that Process liveliness + accounting based on that is a very understood model
in the Hadoop land. The proposal for leases is to continue that.

bq. Is there a scenario I'm missing here?
One example that illustrates this. Today AMs can go away without releasing containers and
YARN can kill the corresponding containers(as they are managed). If we don't have some kind
of leases, and AMs that are unmanaged resources go away without explicit container-release,
those resources are leaked.

bq. YARN is not a power-hungry conscious entity that gets to make decisions for us. Not simply
when a use case violates the abstract idea of YARN controlling everything. [...]
 Of course, when I mean YARN, I mean the YARN community. You take it too literally.

I was pointing out your statements about "Impala currently has little tangible to gain by
doing deployment and enforcement inside YARN", "However, making Impala-YARN integration depend
on this fairly involved work would unnecessarily set it back". YARN community doesn't take
decisions based on those things.

Overall, I didn't originally have a complete solution for making it happen - so came up with
ACLs, leases. But delegation as proposed by Arun seems like one that solves all the problems.
 Other than saying you don't want to wait for impala-under-YARN integration, I haven't heard
any technical reservations against this approach.

> Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource
scheduling
> -----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1404
>                 URL: https://issues.apache.org/jira/browse/YARN-1404
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>    Affects Versions: 2.2.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>         Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its applications
run workload in. External frameworks/systems could benefit from sharing resources with other
Yarn applications while running their workload within long-running processes owned by the
external framework (in other words, running their workload outside of the context of a Yarn
container process). 
> Because Yarn provides robust and scalable resource management, it is desirable for some
external systems to leverage the resource governance capabilities of Yarn (queues, capacities,
scheduling, access control) while supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama (http://cloudera.github.io/llama/)
to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user submits a query,
the processing is broken into 'query fragments' which are run in multiple impalad processes
leveraging data locality (similar to Map-Reduce Mappers processing a collocated HDFS block
of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in the impalad.
As the impalad shares the host with other services (HDFS DataNode, Yarn NodeManager, Hbase
Region Server) and Yarn Applications (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it does not
overload the cluster nodes, before running a 'query fragment' in a node, Impala requests the
required amount of CPU and memory from Yarn. Once the requested CPU and memory has been allocated,
Impala starts running the 'query fragment' taking care that the 'query fragment' does not
use more resources than the ones that have been allocated. Memory is book kept per 'query
fragment' and the threads used for the processing of the 'query fragment' are placed under
a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container) process must
be started via the corresponding NodeManager. Failing to do this, will result on the cancelation
of the container allocation relinquishing the acquired resource capacity back to the pool
of available resources. To avoid this, Impala starts a dummy container process doing 'sleep
10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU shares that are
not used and Impala is re-issuing those CPU shares to another cgroup for the thread running
the 'query fragment'. The cgroup CPU enforcement works correctly because of the CPU controller
implementation (but the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory independent of each other. Some requests may be only
memory with no CPU or viceversa. Because a container requires a process, complete absence
of memory or CPU is not possible even if the dummy process is 'sleep', a minimal amount of
memory and CPU is required for the dummy process.
> Because of this it is desirable to be able to have a container without a backing process.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message