hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matei Zaharia (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4803) large pending jobs hog resources
Date Mon, 15 Dec 2008 20:29:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656738#action_12656738

Matei Zaharia commented on HADOOP-4803:

I agree, I think the "catching up" idea could help here. Basically the problem is the following
- if you have a job with long tasks, when it makes it to the head of the queue (max deficit),
it may grab a *lot* of slots and hold onto them for a while. Instead, we should give it a
more moderate share - enough that it can "catch up" at a reasonable time. It may also be good
to take into account task durations for the job in this equation - i.e. look further ahead
for jobs with large tasks.

> large pending jobs hog resources
> --------------------------------
>                 Key: HADOOP-4803
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4803
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Joydeep Sen Sarma
>            Assignee: Matei Zaharia
> observing the cluster over the last day - one thing i noticed is that small jobs (single
digit tasks) are not doing a good job competing against large jobs. what seems to happen is
> - large job comes along and needs to wait for a while for other large jobs.
> - slots are slowly transfered from one large job to another
> - small tasks keep waiting forever.
> is this an artifact of deficit based scheduling? it seems that long pending large jobs
are out-scheduling small jobs

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message