aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maxim Khutornenko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-1600) Job updates with large count of instance overrides halt scheduler perf
Date Wed, 10 Feb 2016 00:54:18 GMT

    [ https://issues.apache.org/jira/browse/AURORA-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140124#comment-15140124
] 

Maxim Khutornenko commented on AURORA-1600:
-------------------------------------------

Unreverting: https://reviews.apache.org/r/43396/

> Job updates with large count of instance overrides halt scheduler perf
> ----------------------------------------------------------------------
>
>                 Key: AURORA-1600
>                 URL: https://issues.apache.org/jira/browse/AURORA-1600
>             Project: Aurora
>          Issue Type: Bug
>          Components: Scheduler
>            Reporter: Maxim Khutornenko
>            Assignee: Maxim Khutornenko
>            Priority: Critical
>             Fix For: 0.12.0
>
>
> We have observed a case when a user update with a large number of specified instance
overrides (updateOnlyTheseInstances) results in significant performance deterioration to the
extent of scheduler processing almost no offers and not scheduling any pending tasks for long
periods (minutes to hours). 
> The culprit appears to be the {{selectInstructions}} query. It's unacceptably slow when
number of instanceConfigs and/or instance overrides approaches 100. Since it's called inside
a write lock to guide individual instance updates, nothing else can proceed including status
updates and offer activities. 
> I was able to replicate this in jmh. Fix is incoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message