hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN
Date Fri, 21 Nov 2014 23:06:34 GMT

    [ https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221556#comment-14221556

Wangda Tan commented on YARN-2139:

Thanks [~ywskycn] for the design doc and prototype.

I have similar feeling like what [~acmurthy] commented, the disk resource is a little different
from vcore. CPU is a shared resource, processes/threads can occupy cpu cores and also can
be easily switch to another cores. But disks is not, (in spite of RAID), if a process write
to a file on local disk (like Kafka), you cannot switch the file being writing to another
disk easily.

And also, we need consider if there're multiple containers scheduled to a same physical disk,
it is possible that the total bandwidth of these containers will drop very fast.

So I think the scheduling for disks is more like *affinity* to disks (like give disk#1,#2,#4
to the container) instead of just limit number of processes on each node.

Any thoughts? Please feel free to correct me if I was wrong.


> [Umbrella] Support for Disk as a Resource in YARN 
> --------------------------------------------------
>                 Key: YARN-2139
>                 URL: https://issues.apache.org/jira/browse/YARN-2139
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Wei Yan
>         Attachments: Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf,
YARN-2139-prototype-2.patch, YARN-2139-prototype.patch
> YARN should consider disk as another resource for (1) scheduling tasks on nodes, (2)
isolation at runtime, (3) spindle locality. 

This message was sent by Atlassian JIRA

View raw message