hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN
Date Wed, 03 Dec 2014 18:57:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233341#comment-14233341
] 

Bikas Saha commented on YARN-2139:
----------------------------------

So to be clear, currently vdisks is counting the number of physical drives present on the
box.

Something to keep in mind would be whether this also entails a change in the NM policy of
providing a directly on every local dir (which typically maps to every disk) to every task.
And tasks are free to choose one or more of those dirs (disks) to write to. This puts the
spinning disk head under contention and affects performance of all writers on that disk because
seeks are expensive. The thumb rule tends to be to allocate as many number of tasks to a machine
as the number of disks (maybe 2x) so as to keep this seek cost low. Should we consider evaluating
a change in this policy that gives a container 1 local dir to a container with 1 vdisk. This
way for a machine with 6 disks (and 6 vdisks) would have 6 tasks running, each with their
own "dedicated" disk. Off hand its hard to say how this would compare with all 6 disks allocated
to all 6 tasks and letting cgroups enforce sharing. If multiple tasks end up choosing the
same disk for their writes, then they may not end up getting the "allocation" that they thought
they would get.

> [Umbrella] Support for Disk as a Resource in YARN 
> --------------------------------------------------
>
>                 Key: YARN-2139
>                 URL: https://issues.apache.org/jira/browse/YARN-2139
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Wei Yan
>         Attachments: Disk_IO_Isolation_Scheduling_3.pdf, Disk_IO_Scheduling_Design_1.pdf,
Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, YARN-2139-prototype.patch
>
>
> YARN should consider disk as another resource for (1) scheduling tasks on nodes, (2)
isolation at runtime, (3) spindle locality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message