hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ha Son Hai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5202) Dynamic Overcommit of Node Resources - POC
Date Thu, 18 Aug 2016 22:23:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427275#comment-15427275
] 

Ha Son Hai commented on YARN-5202:
----------------------------------

Hi [~nroberts]!

Would you mind if I ask for a little bit more on the explanation of this parameter "RM_OVERCOMMIT_MEM_MAX_FACTOR"?
I set it to a different value, but the log of the ResourceManager reports that value 0. It's
the same for vcoreFactor. I wonder if it's a bug?

By the way, is RM_OVERCOMMIT_MEM_MAX_FACTOR redundant with RM_OVERCOMMIT_MEM_INCREMENT? one
is in ratio and the other is in MBs?
In the case that I have node with 32Gigs RAM, if I set RM_OVERCOMMIT_MEM_MAX_FACTOR to 2,
does it mean that I can over-commit for 2 times the total memory that I have (in case the
utilization is very low) that is 64Gigs?

Sorry if the question is a "basic" or a "stupid" one. I have just started to work with the
code of HADOOP so there is a lot of things that's new to me.
Thanks a lot for your clarification. I attached below your explanation for the parameters.


    + RM_OVERCOMMIT_MEM_INCREMENT: Specifies the largest memory increment in megabytes when
enlarging a node's total resource for overcommit. Once incremented at least one container
must be launched on the node to increase the value further. A value <= 0 will disable memory
overcommit.
    + RM_OVERCOMMIT_MEM_MAX_FACTOR
    Maximum amount of memory to overcommit as a factor of the total node memory. A value <=
0 disables memory overcommit.


> Dynamic Overcommit of Node Resources - POC
> ------------------------------------------
>
>                 Key: YARN-5202
>                 URL: https://issues.apache.org/jira/browse/YARN-5202
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager, resourcemanager
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Nathan Roberts
>            Assignee: Nathan Roberts
>         Attachments: YARN-5202-branch2.7-uber.patch, YARN-5202.patch
>
>
> This Jira is to present a proof-of-concept implementation (collaboration between [~jlowe]
and myself) of a dynamic over-commit implementation in YARN.  The type of over-commit implemented
in this jira is similar to but not as full-featured as what's being implemented via YARN-1011.
YARN-1011 is where we see ourselves heading but we needed something quick and completely transparent
so that we could test it at scale with our varying workloads (mainly MapReduce, Spark, and
Tez). Doing so has shed some light on how much additional capacity we can achieve with over-commit
approaches, and has fleshed out some of the problems these approaches will face.
> Primary design goals:
> - Avoid changing protocols, application frameworks, or core scheduler logic,  - simply
adjust individual nodes' available resources based on current node utilization and then let
scheduler do what it normally does
> - Over-commit slowly, pull back aggressively - If things are looking good and there is
demand, slowly add resource. If memory starts to look over-utilized, aggressively reduce the
amount of over-commit.
> - Make sure the nodes protect themselves - i.e. if memory utilization on a node gets
too high, preempt something - preferably something from a preemptable queue
> A patch against trunk will be attached shortly.  Some notes on the patch:
> - This feature was originally developed against something akin to 2.7.  Since the patch
is mainly to explain the approach, we didn't do any sort of testing against trunk except for
basic build and basic unit tests
> - The key pieces of functionality are in {{SchedulerNode}}, {{AbstractYarnScheduler}},
and {{NodeResourceMonitorImpl}}. The remainder of the patch is mainly UI, Config, Metrics,
Tests, and some minor code duplication (e.g. to optimize node resource changes we treat an
over-commit resource change differently than an updateNodeResource change - i.e. remove_node/add_node
is just too expensive for the frequency of over-commit changes)
> - We only over-commit memory at this point. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message