hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4035) Modify the capacity scheduler (HADOOP-3445) to schedule tasks based on memory requirements and task trackers free memory
Date Wed, 29 Oct 2008 05:30:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643414#action_12643414
] 

Allen Wittenauer commented on HADOOP-4035:
------------------------------------------

> I guess that would work, but in general it works better if we have ratios instead since
they automatically 
> scale as hardware improves.

I disagree.  Matei is right on.

This value needs to be an offset of the total amount of memory on the machine ("hadoop may
use all but  4g").  Percentages don't really work well here because any ops team worth its
salt knows exactly how much they need to reserve for its own stuff, the OS, monitoring probes,
etc., all the background processes that sort of run in the background as noise.  That size
is almost guaranteed to be a constant on similar gear with the same OS. [.. and if one is
doing a radically heterogeneous cluster, they've got other problems besides this one!]

Setting this to a percentage is actually going to leave memory on the table. In our real-world
grids, every rack has a node with 16g phys ram in it with the rest of the nodes being 8g phys
ram.  All nodes have 24g of swap.  So our numbers are 32g and 40g.  Setting this to 87% (I
think?--the fact that I'm not sure I have this value right should be another hint!) to reserve
4g of VM means that we lose 2g of mem that could be available to Hadoop on the 16g RAM nodes!

> Modify the capacity scheduler (HADOOP-3445) to schedule tasks based on memory requirements
and task trackers free memory
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4035
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4035
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: 4035.1.patch, HADOOP-4035-20080918.1.txt, HADOOP-4035-20081006.1.txt,
HADOOP-4035-20081006.txt, HADOOP-4035-20081008.txt
>
>
> HADOOP-3759 introduced configuration variables that can be used to specify memory requirements
for jobs, and also modified the tasktrackers to report their free memory. The capacity scheduler
in HADOOP-3445 should schedule tasks based on these parameters. A task that is scheduled on
a TT that uses more than the default amount of memory per slot can be viewed as effectively
using more than one slot, as it would decrease the amount of free memory on the TT by more
than the default amount while it runs. The scheduler should make the used capacity account
for this additional usage while enforcing limits, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message