Return-Path: X-Original-To: apmail-aurora-issues-archive@minotaur.apache.org Delivered-To: apmail-aurora-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 541F418266 for ; Wed, 10 Feb 2016 00:54:18 +0000 (UTC) Received: (qmail 60525 invoked by uid 500); 10 Feb 2016 00:54:18 -0000 Delivered-To: apmail-aurora-issues-archive@aurora.apache.org Received: (qmail 60444 invoked by uid 500); 10 Feb 2016 00:54:18 -0000 Mailing-List: contact issues-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@aurora.apache.org Delivered-To: mailing list issues@aurora.apache.org Received: (qmail 60421 invoked by uid 99); 10 Feb 2016 00:54:18 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Feb 2016 00:54:18 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 1588A2C1F58 for ; Wed, 10 Feb 2016 00:54:18 +0000 (UTC) Date: Wed, 10 Feb 2016 00:54:18 +0000 (UTC) From: "Maxim Khutornenko (JIRA)" To: issues@aurora.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (AURORA-1600) Job updates with large count of instance overrides halt scheduler perf MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AURORA-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140124#comment-15140124 ] Maxim Khutornenko commented on AURORA-1600: ------------------------------------------- Unreverting: https://reviews.apache.org/r/43396/ > Job updates with large count of instance overrides halt scheduler perf > ---------------------------------------------------------------------- > > Key: AURORA-1600 > URL: https://issues.apache.org/jira/browse/AURORA-1600 > Project: Aurora > Issue Type: Bug > Components: Scheduler > Reporter: Maxim Khutornenko > Assignee: Maxim Khutornenko > Priority: Critical > Fix For: 0.12.0 > > > We have observed a case when a user update with a large number of specified instance overrides (updateOnlyTheseInstances) results in significant performance deterioration to the extent of scheduler processing almost no offers and not scheduling any pending tasks for long periods (minutes to hours). > The culprit appears to be the {{selectInstructions}} query. It's unacceptably slow when number of instanceConfigs and/or instance overrides approaches 100. Since it's called inside a write lock to guide individual instance updates, nothing else can proceed including status updates and offer activities. > I was able to replicate this in jmh. Fix is incoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)