mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Rukletsov (JIRA)" <>
Subject [jira] [Commented] (MESOS-5698) Quota sorter not updated for resource changes at agent.
Date Fri, 24 Jun 2016 13:02:16 GMT


Alexander Rukletsov commented on MESOS-5698:

Promoted this to a blocker for 1.0.0.

In an update allocation function (,
we also update total resources, which is misleading. Since quota role sorter does not update
role allocations for roles for which the quota is not set, it does not call the 4-argument
allocate in the sorter and hence does not account changes to total resources properly.

> Quota sorter not updated for resource changes at agent.
> -------------------------------------------------------
>                 Key: MESOS-5698
>                 URL:
>             Project: Mesos
>          Issue Type: Bug
>          Components: allocation
>    Affects Versions: 0.27.3, 0.28.2
>            Reporter: Neil Conway
>            Assignee: Klaus Ma
>            Priority: Blocker
>              Labels: mesosphere, quota
>             Fix For: 1.0.0
> Consider this sequence of events:
> 1. Slave connects, with 128MB of disk.
> 2. Master offers resources at slave to framework
> 3. Framework creates a dynamic reservation for 1MB and a persistent volume of the same
size on the slave's resources.
>   => This invokes {{Master::apply}}, which invokes {{allocator->updateAllocation}},
which invokes {{Sorter::update()}} on the framework sorter and role sorter. If the framework's
role has a configured quota, it also invokes {{update}} on the quota role sorter -- in this
case, the framework's role has no quota, so the quota role sorter is *not* updated.
>   => {{DRFSorter::update}} updates the *total* resources at a given slave, among updating
other state. New total resources will be 127MB of unreserved disk and 1MB of reserved disk
with a volume. Note that the quota role sorter still thinks the slave has 128MB of unreserved
> 4. The slave is removed from the cluster. {{HierarchicalAllocatorProcess::removeSlave}}
> {code}
>   roleSorter->remove(slaveId, slaves[slaveId].total);
>   quotaRoleSorter->remove(slaveId, slaves[slaveId].total.nonRevocable());
> {code}
> {{slaves\[slaveId\].total.nonRevocable()}} is 127MB of unreserved disk and 1MB of reserved
disk with a volume. When we remove this from the quota role sorter, we're left with total
resources on the reserved slave of 1MB of unreserved disk, since that is the result of subtracting
<127MB unreserved, 1MB reserved+volume> from <128MB unreserved>.
> The implications of this can't be good: at minimum, we're leaking resources for removed
slaves in the quota role sorter. We're also introducing an inconsistency between {{total_.resources\[slaveId\]}}
and {{total_.scalarQuantities}}, since the latter has already stripped-out volume/reservation

This message was sent by Atlassian JIRA

View raw message