mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dario Rexin (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MESOS-4694) DRFAllocator takes very long to allocate resources with a large number of frameworks
Date Wed, 17 Feb 2016 19:29:18 GMT
Dario Rexin created MESOS-4694:
----------------------------------

             Summary: DRFAllocator takes very long to allocate resources with a large number
of frameworks
                 Key: MESOS-4694
                 URL: https://issues.apache.org/jira/browse/MESOS-4694
             Project: Mesos
          Issue Type: Bug
          Components: allocation
    Affects Versions: 0.26.0, 0.27.0, 0.27.1
            Reporter: Dario Rexin
            Assignee: Dario Rexin


With a growing number of connected frameworks, the allocation time grows to very high numbers.
The addition of quota in 0.27 had an additional impact on these numbers. Running `mesos-tests.sh
--benchmark --gtest_filter=HierarchicalAllocator_BENCHMARK_Test.DeclineOffers` gives us the
following numbers:

{noformat}
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from HierarchicalAllocator_BENCHMARK_Test
[ RUN      ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
Using 2000 slaves and 200 frameworks
round 0 allocate took 2.921202secs to make 200 offers
round 1 allocate took 2.85045secs to make 200 offers
round 2 allocate took 2.823768secs to make 200 offers
{noformat}

Increasing the number of frameworks to 2000:

{noformat}
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from HierarchicalAllocator_BENCHMARK_Test
[ RUN      ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
Using 2000 slaves and 2000 frameworks
round 0 allocate took 28.209454secs to make 2000 offers
round 1 allocate took 28.469419secs to make 2000 offers
round 2 allocate took 28.138086secs to make 2000 offers
{noformat}

I was able to reduce this time by a substantial amount. After applying the patches:

{noformat}
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from HierarchicalAllocator_BENCHMARK_Test
[ RUN      ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
Using 2000 slaves and 200 frameworks
round 0 allocate took 1.016226secs to make 2000 offers
round 1 allocate took 1.102729secs to make 2000 offers
round 2 allocate took 1.102624secs to make 2000 offers
{noformat}

And with 2000 frameworks:

{noformat}
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from HierarchicalAllocator_BENCHMARK_Test
[ RUN      ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
Using 2000 slaves and 2000 frameworks
round 0 allocate took 12.563203secs to make 2000 offers
round 1 allocate took 12.437517secs to make 2000 offers
round 2 allocate took 12.470708secs to make 2000 offers
{noformat}

The patches do 3 things to improve the performance of the allocator.

1) The total values in the DRFSorter will be pre calculated per resource type

2) In the allocate method, when no resources are available to allocate, we break out of the
innermost loop to prevent looping over a large number of frameworks when we have nothing to
allocate

3) when a framework suppresses offers, we remove it from the sorter instead of just calling
continue in the allocation loop - this greatly improves performance in the sorter and prevents
looping over frameworks that don't need resources

Assuming that most of the frameworks behave nicely and suppress offers when they have nothing
to schedule, it is fair to assume, that point 3) has the biggest impact on the performance.
If we suppress offers for 90% of the frameworks in the benchmark test, we see following numbers:

{noformat}
==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from HierarchicalAllocator_BENCHMARK_Test
[ RUN      ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
Using 200 slaves and 2000 frameworks
round 0 allocate took 11626us to make 200 offers
round 1 allocate took 22890us to make 200 offers
round 2 allocate took 21346us to make 200 offers
{noformat}

And for 200 frameworks:

{noformat}
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from HierarchicalAllocator_BENCHMARK_Test
[ RUN      ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
Using 2000 slaves and 2000 frameworks
round 0 allocate took 1.11178secs to make 2000 offers
round 1 allocate took 1.062649secs to make 2000 offers
round 2 allocate took 1.080181secs to make 2000 offers
{noformat}

Review requests:

https://reviews.apache.org/r/43665/
https://reviews.apache.org/r/43666/
https://reviews.apache.org/r/43668/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message