hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "MENG DING (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-4080) Capacity planning for long running services on YARN
Date Tue, 25 Aug 2015 17:08:45 GMT

     [ https://issues.apache.org/jira/browse/YARN-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

MENG DING updated YARN-4080:
----------------------------
    Description: 
YARN-1197 addresses the functionality of container resource resize. One major use case of
this feature is for long running services managed by Slider to dynamically flex up and down
resource allocation of individual components (e.g., HBase region server), based on application
metrics/alerts obtained through third-party monitoring and policy engine. 

One key issue with increasing container resource at any point of time is that the additional
resource needed by the application component may not be available *on the specific node*.
In this case, we need to rely on preemption logic to reclaim the required resource back from
other (preemptable) applications running on the same node. But this may not be possible today
because:
* preemption doesn't consider constraints of pending resource requests, such as hard locality
requirements, user limits, etc (being addressed in YARN-2154 and possibly in YARN-3769?) 
* there may not be any preemptable container available due to the fact that no queue is over
its guaranteed capacity.

What we need, ideally, is a way for YARN to support future capacity planning of long running
services. At the minimum, we need to provide a way to let YARN know about the resource usage
prediction/pattern of a long running service. And given this knowledge, YARN should be able
to preempt resources from other applications to accommodate the resource needs of the long
running service.

  was:
YARN-1197 addresses the functionality of container resource resize. One major use case of
this feature is for long running services managed by Slider to dynamically flex up and down
resource allocation of individual components (e.g., HBase region server), based on application
metrics/alerts obtained through third-party monitoring and policy engine. 

One key issue with increasing container resource at any point of time is that the additional
resource needed by the application component may not be available *on the specific node*.
In this case, we need to rely on preemption logic to reclaim the required resource back from
other (preemptable) applications running on the same node. But this may not be possible today
because:
* preemption doesn't consider constraints of pending resource requests, such as hard locality
requirements, user limits, etc (being addressed in YARN-2154 and possibly in YARN-3769?) 
* there may not be any preemptable container available due to the fact that no application
is over its guaranteed capacity.

What we need, ideally, is a way for YARN to support future capacity planning of long running
services. At the minimum, we need to provide a way to let YARN know about the resource usage
prediction/pattern of a long running service. And given this knowledge, YARN should be able
to preempt resources from other applications to accommodate the resource needs of the long
running service.


> Capacity planning for long running services on YARN
> ---------------------------------------------------
>
>                 Key: YARN-4080
>                 URL: https://issues.apache.org/jira/browse/YARN-4080
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: api, resourcemanager
>            Reporter: MENG DING
>
> YARN-1197 addresses the functionality of container resource resize. One major use case
of this feature is for long running services managed by Slider to dynamically flex up and
down resource allocation of individual components (e.g., HBase region server), based on application
metrics/alerts obtained through third-party monitoring and policy engine. 
> One key issue with increasing container resource at any point of time is that the additional
resource needed by the application component may not be available *on the specific node*.
In this case, we need to rely on preemption logic to reclaim the required resource back from
other (preemptable) applications running on the same node. But this may not be possible today
because:
> * preemption doesn't consider constraints of pending resource requests, such as hard
locality requirements, user limits, etc (being addressed in YARN-2154 and possibly in YARN-3769?)

> * there may not be any preemptable container available due to the fact that no queue
is over its guaranteed capacity.
> What we need, ideally, is a way for YARN to support future capacity planning of long
running services. At the minimum, we need to provide a way to let YARN know about the resource
usage prediction/pattern of a long running service. And given this knowledge, YARN should
be able to preempt resources from other applications to accommodate the resource needs of
the long running service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message