hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4035) Modify the capacity scheduler (HADOOP-3445) to schedule tasks based on memory requirements and task trackers free memory
Date Mon, 03 Nov 2008 08:44:46 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644660#action_12644660
] 

Hemanth Yamijala commented on HADOOP-4035:
------------------------------------------

bq. Should the memory-related config values be expressed in MB or GB or KB or just bytes?
MB sounds good to me.
The other parameter we have in hadoop related to memory is mapred.child.ulimit which is specified
in KB. I think expressing these values in KB would keep things consistent.

bq. If a job's specified VM or RAM task limit is higher than the max limit, that job shouldn't
be allowed to run. Should the JT reject the job when it is submitted, or should the scheduler
do it, by failing the job?
I think scheduler failing the job is more consistent if the scheduling decisions are being
made in the scheduler.

bq. Should the Capacity Scheduler use the entire RAM of a TT when making a scheduling decision,
or an offset?
I am not really sure either way. Given earlier discussions we've had that virtual memory is
what really matters, I am guessing we don't need it.

Regarding the config variable names, a few concerns/suggestions:
- mapred.tasktracker.virtualmemory.reserved: This seems like specifying the amount of memory
reserved for Hadoop, whereas it means the opposite. Can we call it mapred.tasktracker.vmem.excluded
?
- We are using 'virtualmemory', and 'vm' to represent virtual memory. Should we consistently
name it as 'vmem' everywhere ?
- Similar to excluded, rename variables to mapred.task.maxvmem.default and mapred.task.maxvmem.limit
?

Does this make sense ?

> Modify the capacity scheduler (HADOOP-3445) to schedule tasks based on memory requirements
and task trackers free memory
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4035
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4035
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: 4035.1.patch, HADOOP-4035-20080918.1.txt, HADOOP-4035-20081006.1.txt,
HADOOP-4035-20081006.txt, HADOOP-4035-20081008.txt
>
>
> HADOOP-3759 introduced configuration variables that can be used to specify memory requirements
for jobs, and also modified the tasktrackers to report their free memory. The capacity scheduler
in HADOOP-3445 should schedule tasks based on these parameters. A task that is scheduled on
a TT that uses more than the default amount of memory per slot can be viewed as effectively
using more than one slot, as it would decrease the amount of free memory on the TT by more
than the default amount while it runs. The scheduler should make the used capacity account
for this additional usage while enforcing limits, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message