hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5075) Potential infinite loop in updateMinSlots
Date Tue, 20 Jan 2009 20:19:02 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665540#action_12665540
] 

Joydeep Sen Sarma commented on HADOOP-5075:
-------------------------------------------

question - regarding the 'break' in the slotsLeft == oldSlots

this doesn't look correct to me - it seems that there is no guarantee that all available slots
are distributed in one round. and that is why earlier we had a for loop over the slots. but
now we are claiming that by going over the jobs one last time - we will be able to distribute
all the slots?

The basic problem seems to be:

             int share = (int) Math.ceil(oldSlots * weight / totalWeight);
              slotsLeft = giveMinSlots(job, type, slotsLeft, share);

I believe that the share computed is quite likely to be less than the maximum number of slots
that the task can consume. So going from 'floor' to 'ceil' may not be enough to guarantee
that slots get consumed (and certainly not enough to consume that *all* the slots left get
consumed).

my gut feel is that the correct solution (when oldSlots == slotsLeft) should be something
that takes into account the max tasks that a job can consume (as opposed to it's weighted
share only). 


> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>             Fix For: 0.19.1, 0.20.0, 0.21.0
>
>         Attachments: hadoop-5075-v2.patch, hadoop-5075-v3.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was
repeating infinitely. This might happen if, due to rounding, we are unable to assign the last
few slots in a pool. This patch adds a break statement to ensure that the loop exists if it
hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message