mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Rukletsov (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-5698) Quota sorter not updated for resource changes at agent.
Date Fri, 24 Jun 2016 13:02:16 GMT

     [ https://issues.apache.org/jira/browse/MESOS-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alexander Rukletsov updated MESOS-5698:
---------------------------------------
    Assignee: Neil Conway  (was: Klaus Ma)

> Quota sorter not updated for resource changes at agent.
> -------------------------------------------------------
>
>                 Key: MESOS-5698
>                 URL: https://issues.apache.org/jira/browse/MESOS-5698
>             Project: Mesos
>          Issue Type: Bug
>          Components: allocation
>    Affects Versions: 0.27.3, 0.28.2
>            Reporter: Neil Conway
>            Assignee: Neil Conway
>            Priority: Blocker
>              Labels: mesosphere, quota
>             Fix For: 1.0.0
>
>
> Consider this sequence of events:
> 1. Slave connects, with 128MB of disk.
> 2. Master offers resources at slave to framework
> 3. Framework creates a dynamic reservation for 1MB and a persistent volume of the same
size on the slave's resources.
>   => This invokes {{Master::apply}}, which invokes {{allocator->updateAllocation}},
which invokes {{Sorter::update()}} on the framework sorter and role sorter. If the framework's
role has a configured quota, it also invokes {{update}} on the quota role sorter -- in this
case, the framework's role has no quota, so the quota role sorter is *not* updated.
>   => {{DRFSorter::update}} updates the *total* resources at a given slave, among updating
other state. New total resources will be 127MB of unreserved disk and 1MB of reserved disk
with a volume. Note that the quota role sorter still thinks the slave has 128MB of unreserved
disk.
> 4. The slave is removed from the cluster. {{HierarchicalAllocatorProcess::removeSlave}}
invokes:
> {code}
>   roleSorter->remove(slaveId, slaves[slaveId].total);
>   quotaRoleSorter->remove(slaveId, slaves[slaveId].total.nonRevocable());
> {code}
> {{slaves\[slaveId\].total.nonRevocable()}} is 127MB of unreserved disk and 1MB of reserved
disk with a volume. When we remove this from the quota role sorter, we're left with total
resources on the reserved slave of 1MB of unreserved disk, since that is the result of subtracting
<127MB unreserved, 1MB reserved+volume> from <128MB unreserved>.
> The implications of this can't be good: at minimum, we're leaking resources for removed
slaves in the quota role sorter. We're also introducing an inconsistency between {{total_.resources\[slaveId\]}}
and {{total_.scalarQuantities}}, since the latter has already stripped-out volume/reservation
information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message