hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5964) Fix the 'cluster drain' problem in the Capacity Scheduler wrt High RAM Jobs
Date Thu, 18 Jun 2009 15:52:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721321#action_12721321
] 

Hemanth Yamijala commented on HADOOP-5964:
------------------------------------------

I am looking at this patch as comprising of three separate parts:
- Changes to the scheduler for fixing the under utilization problem in the face of high RAM
jobs
- The new TaskTracker class, its lifecycle and changes in JobTracker to support this.
- The changes on the old TaskTracker class to account for number of slots.

I've currently done the first and partly the second part.

Some comments so far:

TaskTrackerStatus:
 - countOccupiedMapSlots: the check for whether a task is running, based on it's status, seems
complicated enough to move to an API that can be called from both countMapTasks and this API.
This way, any changes to it will cause the right behavior for both APIs. Likewise, for reduces.

mapreduce.TaskTracker:
 - reserveSlots: java doc refers to reserving on 'map' slots.
 - Why do we need to maintain a count of slots reserved (numFallowMapSlots). I see that the
accessor API is not used anywhere. 

CapacityTaskScheduler:
 - Why are we reserving available slots on the tasktracker. Shouldn't we always be reserving
only how much this job requires ? In that case, do we need a re-reservation ?
 - When we try to get a task for a job ignoring user limits (i.e. if the cluster is free),
we are not reserving TTs. Is this by design ? Also, is it for the same reason that we are
not checking for user limits when assigning a task to a reserved TT ?

JobConf:
 - Since computeNumSlotsPerMap is used only by CapacityScheduler right now, should we just
leave this computation out of JobConf ?

JobInitializationPoller:
 - Lets not pass the scheduler instance to the poller. I think it only needs the number of
map slots and reduce slots. We can pass just that much. We've seen in the past that passing
entire objects like the scheduler makes testing classes difficult. Also, not all information
is required.

JobTracker:
 - When a job is killed, we are not clearing reserved trackers for this job.
 - Likewise, when a TT is blacklisted do we need to remove the reservations ?
 - It seems like the changes in JobTracker can be reduced a little if we do not change APIs
that are passed a TTstatus object or a tasktracker name. We can still change the maps to be
built of TaskTracker objects, but retrieve the status wherever necessary and pass it to methods.
This way the changes may be fewer and easier to verify. For e.g. I think this is possible
in the ExpireTrackers class.

Some nits:
- In some places, formatting more than 80 characters in a line. (E.g. mapreduce.TaskTracker.java)
- There are lot of LOG.info statements, possibly to enable testing / debugging. Can you please
remove these ?
- Fallow seems a complicated word to understand. Is 'Reserved' good enough ?

Will continue with the review...

> Fix the 'cluster drain' problem in the Capacity Scheduler wrt High RAM Jobs
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5964
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5964
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5964_0_20090602.patch, HADOOP-5964_1_20090608.patch, HADOOP-5964_2_20090609.patch,
HADOOP-5964_4_20090615.patch, HADOOP-5964_6_20090617.patch, HADOOP-5964_7_20090618.patch
>
>
> When a HighRAMJob turns up at the head of the queue, the current implementation of support
for HighRAMJobs in the Capacity Scheduler has problem in that the scheduler stops assigning
tasks to all TaskTrackers in the cluster until a HighRAMJob finds a suitable TaskTrackers
for all its tasks.
> This causes a severe utilization problem since effectively no new tasks are allowed to
run until the HighRAMJob (at the head of the queue) gets slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message