aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Erb (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-1802) AttributeAggregate slows down scheduling of jobs with many instances
Date Thu, 17 Nov 2016 23:33:58 GMT

    [ https://issues.apache.org/jira/browse/AURORA-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15675147#comment-15675147
] 

Stephan Erb commented on AURORA-1802:
-------------------------------------

Scheduling performance benchmark: https://reviews.apache.org/r/53862/

> AttributeAggregate slows down scheduling of jobs with many instances
> --------------------------------------------------------------------
>
>                 Key: AURORA-1802
>                 URL: https://issues.apache.org/jira/browse/AURORA-1802
>             Project: Aurora
>          Issue Type: Bug
>          Components: Scheduler
>            Reporter: Stephan Erb
>             Fix For: 0.17.0
>
>
> The current implementation of [{{AttributeAggregate}}|https://github.com/apache/aurora/blob/f559e930659e25b3d7cacb7b845ebda50d18d66a/src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java]
slows down scheduling of jobs with many instances. Interestingly, this is currently not visible
in our job scheduling benchmark results as it only affects the benchmark setup time but not
the measured part.
> {{AttributeAggregate}} relies on {{Suppliers.memoize}} to ensure that it is only computed
once and only when necessary. This has probably been done because the factory [{{AttributeAggregate.getJobActiveState}}|https://github.com/apache/aurora/blob/f559e930659e25b3d7cacb7b845ebda50d18d66a/src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java#L56-L91]
is slow. 
> After some recent changes to schedule multiple task instances per scheduling round the
aggregate is computed in each scheduling round via the call [{{resourceRequest.getJobState().updateAttributeAggregate(...)}}
|https://github.com/apache/aurora/blob/f559e930659e25b3d7cacb7b845ebda50d18d66a/src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java#L173]
in {{TaskAssigner}}. This means the expensive factory is called once per scheduling round.
> h3. Potential improvements
> * the current factory implementation performs one {{fetchTasks}} query followed by {{n}}
distinct {{getHostAttributes}} queries. This could be reduced to a single SQL query.
> * the aggregate makes heavy use of {{ImmutableMultiset}} even though it is not immutable
any more. There is potential room for improvement here.
> * The aggregate uses suppliers to perform a lazy instantiation even though its current
usage is not lazy any more. We can either make the implementation eager, or ensure that the
expensive part is only run when absolutely necessary.
> h3. Proof of concept
> * 4 mins 23.407 secs -- total runtime of {{./gradlew jmh -Pbenchmarks='SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark'}}
> * 2 mins 40.308 secs -- total runtime of {{./gradlew jmh -Pbenchmarks='SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark'}}
with [{{resourceRequest.getJobState().updateAttributeAggregate(...)}} |https://github.com/apache/aurora/blob/f559e930659e25b3d7cacb7b845ebda50d18d66a/src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java#L173]
commented out. This works as the call is not necessary when only a single instance is scheduled
per scheduling round, as done in the benchmarks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message