Return-Path: X-Original-To: apmail-aurora-issues-archive@minotaur.apache.org Delivered-To: apmail-aurora-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2319911093 for ; Tue, 24 Jun 2014 00:14:49 +0000 (UTC) Received: (qmail 90772 invoked by uid 500); 24 Jun 2014 00:14:49 -0000 Delivered-To: apmail-aurora-issues-archive@aurora.apache.org Received: (qmail 90736 invoked by uid 500); 24 Jun 2014 00:14:49 -0000 Mailing-List: contact issues-help@aurora.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@aurora.incubator.apache.org Delivered-To: mailing list issues@aurora.incubator.apache.org Received: (qmail 90725 invoked by uid 99); 24 Jun 2014 00:14:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Jun 2014 00:14:49 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 24 Jun 2014 00:14:50 +0000 Received: (qmail 89064 invoked by uid 99); 24 Jun 2014 00:14:24 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Jun 2014 00:14:24 +0000 Date: Tue, 24 Jun 2014 00:14:24 +0000 (UTC) From: "Bill Farner (JIRA)" To: issues@aurora.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (AURORA-548) scheduler should always show tasks_lost_rack_XXX metrics MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/AURORA-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041509#comment-14041509 ] Bill Farner commented on AURORA-548: ------------------------------------ The scheduler doesn't necessarily know about all the racks in the cluster, and may only know about certain racks intermittently. I'm afraid the best approach might be to not assume all racks are present. > scheduler should always show tasks_lost_rack_XXX metrics > -------------------------------------------------------- > > Key: AURORA-548 > URL: https://issues.apache.org/jira/browse/AURORA-548 > Project: Aurora > Issue Type: Task > Components: Scheduler > Reporter: David Robinson > > The scheduler's /vars endpoint only exposes a tasks_lost_rack_XXX metric when tasks in a rack have been lost (a tasks_lost_rack_XXX key has a non-zero value). If no tasks in a rack have been lost then metrics for the rack are not exposed. This makes the metrics difficult to use for alerting purposes -- it's impossible to tell whether the rack does not exist or exists but has had no lost tasks. Each rack should have an entry in /vars regardless of whether there have been any lost tasks. > Sample metrics: > tasks_lost_rack_aab 3 > tasks_lost_rack_aae 4 > tasks_lost_rack_aah 2 > tasks_lost_rack_aai 3 > Expected metrics: > tasks_lost_rack_aaa 0 > tasks_lost_rack_aab 3 > tasks_lost_rack_aac 0 > tasks_lost_rack_aad 0 > tasks_lost_rack_aae 4 > tasks_lost_rack_aaf 0 > tasks_lost_rack_aag 0 > tasks_lost_rack_aah 2 > tasks_lost_rack_aai 3 -- This message was sent by Atlassian JIRA (v6.2#6252)