hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Weiwei Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service
Date Wed, 06 Jun 2018 12:24:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503212#comment-16503212

Weiwei Yang commented on YARN-8320:

Hi [~miklos.szegedi@cloudera.com]

I mean by letting user setup both #vcore and #cpus in their resource request is too complex.
Even for phase 1, if only EXCLUSIVE mode is supported, for example:

  #vcore: 100
  #cpu: 10

User want to use exclusive, so the request must be like

  #vcore: 10 * N
  #cpu: N (0<N<=10)

if {{#vcore < 10 * N}}, that means some cpu is wasted.  If user sets this to

  #vcore: 80
  #cpu: 9

after allocation, NM capacity left

  #vcore: 20
  #cpu: 1

now when a #vcore=20 container landed on this node, it can only get 10% cputime (instead of
20%) since 9 cpus are already occupied by request2. This is not expected. And if you think
about RESERVED/SHARED mode, it will be more complex. User will not able to know how many number
of cpus to specify in their request to achieve a RESERVED/SHARED mode cpu sharing.

Does this make sense?


> [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> -------------------------------------------------------------------
>                 Key: YARN-8320
>                 URL: https://issues.apache.org/jira/browse/YARN-8320
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: Jiandan Yang 
>            Priority: Major
>         Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, CPU-isolation-for-latency-sensitive-services-v2.pdf,
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and “cpu.shares”
to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; no support
for differentiated latency
>  * Request latency of services running on container may be frequent shake when all containers
share cpus, and latency-sensitive services can not afford in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to different processors,
this is inspired by the isolation technique in [Borg system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message